REMARKS 



A33626-A 067252.0107 
PATENT 



This paper is being filed in response to the Office Action dated June 3, 2003 that 
was issued in the above-identified application. Applicants respectfully request continued 
examination of the instant application pursuant to 35 U.S.C. § 132(a) and 37 C.F.R. § 1.1 14(a)(2) 
and enclose herewith the fee required pursuant to 37 C.F.R. § 1.17(e). Applicants also request a 
three-month extension of time and enclose the fee required under 37 C.F.R. §1.1 7(a)(3). 
Applicants further enclose herewith a Third Substitute Sequence Listing in paper and computer 
readable form in accordance with 37 C.F.R. §§ 1.821 to 1.825 and a Terminal Disclaimer under 
37 C.F.R. § 1.321(b). Applicants respectfully request reconsideration of the above-identified 
application in light of the amendments and remarks presented in the instant Amendment. 

Claims 42-51, 53, 55-56, 82, and 85-86 are pending. Claims 42-45, 53, and 55 
have been amended. Dependent claim 42 has been amended to correspond to independent claim 
43 and to recite "isolated or purified". Independent claim 43 has been amended to correspond to 
dependent claim 42. Claims 44-45 and 55 have been amended to refer to claim 42 instead of 
claim 43. Therefore, these amendments do not constitute new matter. Amended claim 53 is 
supported by the specification as originally filed, for example, by Example 10 and, therefore, 
does not constitute new matter. Upon entry of the instant Amendment, claims 42-51, 53, 55-56, 
82, and 85-86 will continue to be pending. 

As a preliminary matter, Applicants thank the Examiner for withdrawing earlier 
objections to the specification and claims. Applicants further thank the Examiner for 
withdrawing many of the earlier rejections to the claims including rejections that the claimed 
invention lacks utility, is not enabled by the specification, is indefinite, and is anticipated by 
Campbell (1993,/. Clin. Microbiol 31:2255-2262), Smith (1998, Toxicol 36:1539-1548), 
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Halpern (1993, JBC 268:1 1 186-1 1 192), Whelan (Accession No. M81 186), Jung (1992, FEMS 
Microbiol. Lett 91:69-72), and Williams (U.S. Patent No. 5,919,665). 

Amendments Are Fully Supported 

The specification has been amended to recite Accession Nos. X52066 and 
M81 186. This amendment is supported by the specification as originally filed. For example, the 
specification originally cited Thompson et al., 1990, European Journal of Biochemistry 189:73- 
81 at page 12, lines 9-10. The botulinum neurotoxin serotype A sequence disclosed in Figure 3 
(pp. 76-77) of this document was deposited in Genbank and assigned Accession No. X52066. 
See Accession No. X52066 (annotations recite Thompson et al. as the only "reference")(Exhibit 
1). Additionally, the specification originally cited Whelan et al., 1992, Applied and 
Environmental Microbiology 58:2345-2354 at page 13, line 1. This article recites Accession No. 
M81 186 at page 2346, second column, second full paragraph. 

SEQ ID NOS: 7, 37, 39, 40, 41, and 42 have been amended herein. SEQ ID NO:7 
has been amended to agree with Figure 4 as originally filed with the instant application. 
Therefore, this amendment does not constitute new matter. 

SEQ ED NO:37 has been amended to correspond to the sequence shown in Figure 
2 of U.S. Patent Application No. 08/123,975 by Middlebrook et al. filed on September 21, 1993 
(hereinafter "the '975 application") and to which the instant application claims priority. 
Therefore, this amendment does not constitute new matter. 

SEQ ED NO:39 has been amended to correspond to the sequence shown in Figure 
4 of the '975 application. Therefore, this amendment does not constitute new matter. 

SEQ ID NO:40 has been amended to agree with Figure 3 on page 2349 of Whelan 
(Accession No. M8 1 1 86)(hereinafter "Whelan"). SEQ ID NO:41 has been amended to agree 
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with Figure 3 on pages 76-77 of Thompson (Accession No. X52066)(hereinafter "Thompson"). 
SEQ ID NO:42 has been amended to agree with Whelan. Applicants assert that Thompson and 
Whelan were available to those of ordinary skill in the art at the time the instant application was 
filed. Specifically, the National Center for Biotechnology Information indicates that Thompson 
was "first seen" on April 21, 1993 (Exhibit 1) and Whelan was "first seen" on April 26, 1993 
(Exhibit 2). In addition, the specification has been amended to specifically incorporate 
Thompson and Whelan by reference. In this regard, Applicants respectfully invite the 
Examiner's attention to page 44, line 18-20 of the specification, which states "All publications 
and patent applications are herein incorporated by reference to the same extent as if each 
individual publication or patent application was specifically and individually indicated to be 
incorporated by reference.." Therefore, these amendments do not constitute new matter. 

Declarations 

Applicants submit herewith a Third Substitute Sequence Listing in paper and 
computer readable form. I hereby state that the content of the paper and computer readable 
copies of the Third Substitute Sequence Listing submitted in accordance with 37 C.F.R. 
§1. 821(c) and (e), are the same. I hereby state that the content of the paper and computer 
readable copies of the Third Substitute Sequence Listing, submitted in accordance with 37 
C.F.R. §1. 821(g), herein does not include new matter. 

Applicants Third Substitute Sequence Listing corrects typographical errors in the 
sequences presented in the original application and in the Amendment and Substitute Sequence 
Listing filed on March 5, 2002. Applicants enclose herewith six sequence alignments for the 
Examiner's review. The original sequence, the sequence substituted on March 5, 2002 
(hereinafter "the Substitute sequence"), and the sequence as amended herein (hereinafter "the 
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Amended sequence") are aligned along with the sequence relied upon to support the instant 
amendment. Positions where the sequences differ are highlighted, while positions where all 
aligned sequences match are marked with an asterix. These alignments have been prepared with 
ClustalW 1.74 accessed at <http://www.ch.embnet.org/software/ClustalW.html > with default 
settings. 

Applicants have amended the specification to incorporate the Thompson and 
Whelan sequences in the sequence listing respectively as SEQ ID NOS:43 and 44. Both 
Thompson and Whelan sequences were incorporated by reference into the instant application. 
See page 44, lines 18-20. I hereby declare that SEQ ID NO: 40, as amended herein, is the same 
as Whelan amino acids 853 to 1291. I hereby declare that SEQ ID NO: 41, as amended herein, is 
the same as Thompson amino acids 449 to 1296. I hereby declare that SEQ ID NO: 42, as 
amended herein, is the same as Whelan amino acids 442 to 1291. 

Claims Are Arranged in Proper Sequence 

Claim 42 has been objected to as allegedly depending on a later claim. 
Applicants have amended claim 42 to replace claim 43 and vice-versa. Therefore, upon entry of 
the instant amendment, claims 42 and 43 will be in proper sequence. Applicants, therefore, 
respectfully request withdrawal of this objection. 

Claims Relate to Patentable Subject Matter 

Claims 39-43 {sic, 42-43), 45-47, and 55-56 have been rejected under 35 U.S.C. § 
101 as allegedly directed to non-statutory subject matter. The Examiner has alleged that natural 
variation in nucleotide and amino acid levels for the same or equivalent proteins is common in 
nature. 
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Applicants traverse this rejection and assert that the claims, as amended herein, 
are drawn to patentable subject matter. Applicants, therefore, respectfully request withdrawal of 
this rejection. 

Claims Are Supported By Adequate Written Description 

Claims 42-51, 53-56 (sic, 53, 55-56), 82, and 85-86 have been rejected under 35 
U.S.C. § 1 12, first paragraph as allegedly lacking sufficient written description. The Examiner 
has alleged that Example 8 and Figure 4 do not define a representative number of species of the 
instantly claimed genus of nucleic acid molecules. 

Applicants traverse this rejection and assert that the claims, as amended herein, 
are supported by an adequate written description. The Examiner has alleged that the phrase "a 
nucleic acid comprising a nucleic acid sequence" encompasses variant nucleic acid sequences. 
This phrase has been omitted. Applicants, therefore, respectfully request withdrawal of this 
rejection. 

Claims Are Clear and Definite 

Claim 43 has been rejected under 35 U.S.C. § 1 12, second paragraph as allegedly 
indefinite in reciting "said amino acid sequence comprising at least one immunogenic epitope". 
The Examiner has alleged that it is unclear whether the immunogenic epitope is inherent to SEQ 
ID NO:8 or heterologous to SEQ ID NO:8 since the term "having" is construed to be open. 

Applicants traverse this rejection and assert that it is clear that the immunogenic 
epitope is part of SEQ ID NO:8. Claim 42 has been amended to replace claim 43 and vice-versa. 
Applicants respectfully submit that the antecedent basis for the phrase "said amino acid sequence 
comprising at least one immunogenic epitope" recited in claim 42 is the immediately prior 
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phrase, namely, "the amino acid sequence of SEQ ID NO:8" (emphasis added). Thus, it is clear 
that the immunogenic epitope is part of SEQ ED NO:8. Accordingly, Applicants respectfully 
request withdrawal of this rejection. 

Claim 53 has been rejected under 35 U.S.C. § 1 12, second paragraph as allegedly 
indefinite. The Examiner has alleged that the method steps are no commensurate in scope with 
the preamble. 

Applicants traverse this rejection and assert that claim 53, as amended herein, is 
clear and definite. Both the preamble and the method step recite "isolating." Applicants, 
therefore, respectfully request withdrawal of this rejection. 

Specification and Claims Are Free of New Matter 

Applicant's Preliminary Amendment and Substitute Sequence Listing mailed on 
March 5, 2003 has been objected to under 35 U.S.C. § 132 as allegedly introducing new matter. 

l.SEOIDNO:37 

The Examiner has alleged that Figures 1, 2, and 3 do not support the amendments 
to SEQ ID NO:37. Without acquiescing in this rejection, Applicants have amended SEQ ID 
NO:37 to correct errors. This amendment is supported by Figure 2 of the '975 application to 
which the instant application claims priority. 

2. SEQ ID NOS: 7 and 39 

The Examiner has alleged that Figure 4 (SEQ ID NO: 7) do not support the 
amendments to SEQ ID NO:39 and vice versa. Without acquiescing in this rejection, Applicants 
have amended SEQ ID NOS: 7 and 39 to correct errors. These amendments are respectively 
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supported by the original Figure 4 submitted with the instant application and Figure 4 of the '975 
application. 

3. SEP ID NO:40 

The Examiner has alleged that Accession No. M81 186 and SEQ ID NO:8 do not 
support the amendments to SEQ ID NO:40. Without acquiescing in this rejection, Applicants 
have amended SEQ ID NO:40 to correct errors. This amendment is supported by Figure 3 of 
Whelan (Accession No. M81 186), which was incorporated by reference into the instant 
application. 

4. SEP ID NO:41 

The Examiner has alleged that Figures 1, 2, and 3 and SEQ ID NP:38 do not 
support the amendments to SEQ ID NP:41. Without acquiescing in this rejection, Applicants 
have amended SEQ ID NO:41 to correct errors. This amendment is supported by Figure 3 of 
Thompson (Accession No. X52066), which was incorporated by reference into the instant 
application. 

5. SEP ID NP:42 

The Examiner has alleged that Figure 4 (SEQ ID NO: 8) and Accession No. 
M81 186 do not support the amendments to SEQ ID NO:42. Without acquiescing in this 
rejection, Applicants have amended SEQ ID NP:42 to correct errors. This amendment is 
supported by Figure 3 of Whelan (Accession No. M81 186), which was incorporated by reference 
into the instant application. 
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Thus, all of the amendments to the sequence listing are fully supported and, 
therefore do not constitute new matter. Applicants, therefore, respectfully request withdrawal of 
this rejection. 

Claims Do Not Represent Double Patenting 

Claims 45, 48, 53, and 82 have been rejected under the judicially-created doctrine 
of obviousness-type double patenting as allegedly obvious over U.S. Patent No. 6,495,143 issued 
to Lee JS et al. on December 17, 2002. The Examiner indicated that a timely filed terminal 
disclaimer would overcome this rejection. 

Applicants respectfully invite the Examiner's attention to the Terminal Disclaimer 
enclosed herewith. A copy of the executed Assignment is also enclosed herewith. 

In summary, Applicants believe that all pending claims are in condition for 
allowance and respectfully solicit prompt favorable action. 
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In summary, Applicants believe that all pending claims are in condition for 
allowance and respectfully solicit prompt favorable action. 

Applicants enclose herewith the fee required under 37 C.F.R. §1.1 7(e) and 
§ 1.17(a)(3). Although Applicants do not believe that any additional fees are required with this 
paper, the Commissioner is hereby authorized to charge any fees occasioned by this submission 
not otherwise enclosed herewith to Deposit Account No. 02-4377. Please credit any 
overpayment of fees associated with this filing to the above-identified deposit account. A 
duplicate of this page is enclosed. 

Respectfully submitted, 
BAKER BOTTS, L.L.P. 



December 3, 2003 



lochelle K. Seide ' 



Enclosures 



Rochelle K. Seide 
PTO Reg. No. 32,300 

Carmella L. Stephens 
PTO Reg. No. 41,328 
Attorneys for Applicants 

Guy F. Birkenmeier 
PTO Reg. No. 52,622 
Agent for Applicants 

BAKER BOTTS, L.L.P. 
30 Rockefeller Plaza 
New York, NY 10112 
(212) 408-2500 
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Original_Seq7 
Substitute_Seq7 
Amended_Seq7 
Original_Fig4 



1 30 
GAATTCACGATGGCCAACAAATACAATTCC 
GAATTCACGATGGCCAACAAATACAATTCC 
GAATTCACGATGGCCAACAAATACAATTCC 
GAATTCACGATGGCCAACAAATACAATTCC 



31 60 61 90 91 120 

GAAATCCTGAACAATATCATCCTGAACCTG CGTTACAAAGACAACAATCTGATCGATCTG TCTGGTTACGGTGCTAAAGTTGAAGTATAC 
GAAATCCTGAACAATATCATCCTGAACCTG CGTTACAAAGACAACAATCTGATCGATCTG TCTGGTTACGGTGCTAAAGTTGAAGTATAC 
GAAATCCTGAACAATATCATCCTGAACCTG CGTTACAAAGACAACAATCTGATCGATCTG TCTGGTTACGGTGCTAAAGTTGAAGTATAC 
GAAATCCTGAACAATATCATCCTGAACCTG CGTTACAAAGACAACAATCTGATCGATCTG TCTGGTTACGGTGCTAAAGTTGAAGTATAC 



* * **.*"*:*.*.* 



Original_Seq7 
Substitute_Seq7 
Amended_Seq7 
Original_Fig4 



121 150 151 180 181 210 211 240 

GACGGTGTTGAACTGAATGACAAGAACCAG TTCAAACTGACCTCTTCCGCTAACTCTAAG ATCCGTGTTACTCAGAATCAGAACATCATC TTCAACTCCGTATTC CTGGACTTCTCTGTT 
GACGGTGTTGAACTGAATGACAAGAACCAG TTCAAACTGACCTCTTCCGCTAACTCTAAG ATCCGTGTTACTCAGAATCAGAACATCATC TTCAACTC CGTATTC CTC»GACTTCTCTGTT 
GACGGTGTTGAACTGAATGACAAGAACCAG TTCAAACTGACCTCTTCCGCTAACTCTAAG ATCCGTGTTACTCAGAATCAGAACATCATC TTCAACTCCGTATTCCTGGACTTCTCTGTT 
GACGGTGTTGAACTGAATGACAAGAACCAG TTCAAACTGACCTCTTCCGCTAACTCTAAG ATCCGTGTTACTCAGAATCAGAACATCATC TTCAACTCCGTATTCCTGGACTTCTCTGTT 



>'.#, '* * * * 



'*;■* *** ** *;*?*!* 



Original_Seq7 
Substitute_Seq7 
Amended_Seq7 
Original_Fig4 



241 

TCCTTCTGGAT 
TCCTTCTGGAT 
TCCTTCTGGAT 
T CCTTCT GGAT 
'*>***■#.+■**.** 



270 271 300 301 330 331 360 

|CGTATCCCGAAATACAAG AACGACGGTATCCAGAATTACATCCACAAT GAATACACCATCATCAACTGCATGAAGAAT AACTCTGGTTGGAAGATCTC CATC CGCGGT 
a CGTATC CCG AAATA C AAG AACGACGGTATCCAGAATTACATCCACAAT GAATACACCATCATCAACTGCATGAAGAAT AACTCTGGTTGGAAGATCTCCATC CGCGGT 
ECGTATC C CGAAATA CAAG AACGACGGTATCCAGAATTACATCCACAAT GAATACACCATCATCAACTGCATGAAGAAT AACTCTGGTTGGAAGATCTCCATCCGCGGT 
gCGTATCCCGAAATACAAG AACGACGGTATCCAGA ATTACA TCCACAAT GAATACACCATCATCAACTGCATGAAGAAT AACTCTGGTTGGAAGATCTCCATCCGCGGT 



w y* *"* *v* *>>** 



* * * * * * * 



Original_Seq7 
Substitute_Seq7 
Amende d_Seq7 
Original_Fig4 



361 390 
AACCGTATCATCTGGACTCTGATCGATATC 
AACCGTATCATCTGGACTCTGATCGATATC 
AACCGTATCATCTGGACTCTGATCGATATC 
AACCGTATCATCTGGACTCTGATCGATATC 



******** 



391 420 
AACGGTAAGACCAAATCTGTATTCTTCGAA 
AACGGTAAG AC CAAATCTGTATTCTTCGAA 
AACGGTAAGACCAAATCTGTATTCTTCGAA 
AACGGTAAGACCAAATCTGTATTCTTCGAA 



421 450 451 480 

TACAACATCCGTGAAGACATCTCTGAATAC ATCAATCGCTGGTTCTTCGTTACCATCACC 
TACAACATCCGTGAAGACATCTCTGAATAC ATCAATCGCTGGTTCTTCGTTACCATCACC 
TACAACATCCGTGAAGACATCTCTGAATAC ATCAATCGCTGGTTCTTCGTTACCATCACC 
T^AACATCCGTGAAGACATCTCTGAATAC ATCAATCGCTGGTTCTTCGTTACCATCACC 



481 510 511 540 

Original_Seq7 AATAACCTGAACAATGCTAAAATCTACATC AACGGTAAACTGGAATCTAATACCGACATC 

Substitute_Seq7 AATAACCTGAACAATGCTAAAATCTACATC AACGGTAAACTGGAATCTAATACCGACATC 

Amended_Seq7 AATAACCTGAACAATGCTAAAATCTACATC AACGGTAAACTGGAATCTAATACCGACATC 

Original_Fig4 AATAACCTG^C^TGCTAAAATCTACATC AACGGTAAACTGGAATCTAATACCGACATC 



541 570 571 600 

AAAGACATCCGTGAAGTTATCGCTAACGGT GAAATCATCTTCAAACTGGACGGTGACATC 
AAAGACATCCGTGAAGTTATCGCTAACGGT GAAATCATCTTCAAACTGGACGGTGACATC 
AAAGACATCCGTGAAGTTATCGCTAACGGT GAAATCATCTTCAAACTGGACGGTGACATC 
AAAGACATCCGTGAAGTTATCGCTAACGGT GAAATCATCTTCAAACTGGACGGTGACATC 



Original_Seq7 
Substitute_Seq7 
Amended_Seq7 
Original_Fig4 



601 630 631 660 661 690 691 720 

GATCGTACCCAGTTCATCTGGATGAAATAC TTCTCCATCTTCAACAC CGAACTGTCTCAG TCCAATATCGAAGAACGGTACAAGATCCAG TCTTACTCCGAATACCTGAAAGACTTCTGG 
GATCGTACCCAGTTCATCTGGATGAAATAC TTCTCCATCTTCAACACCGAACTGTCTCAG TCCAATATCGAAGAACGGTACAAGATCCAG TCTTACTCCGAATACCTGAAAGACTTCTGG 
GATCGTACCCAGTTCATCTGGATGAAATAC TTCTCCATCTTCAACACCGAACTGTCTCAG TCCAATATCGAAGAACGGTACAAGATCCAG TCTTACTCCGAATACCTGAAAGACTTCTGG 
9^91^9 C .^S^J^ C ^ A ^ MA ' rAC TTCTCCATCTTCAACACCGAACTGTCTCAG TCCAATATC GAAGAAC GGTACAAGATCCAG TCTTACTCCGAATACCTGAAAGACTTCTGG 



Original_Seq7 
Substitute_Seq7 
Araended_Seq7 
Original_Fig4 



721 750 751 780 781 810 811 840 

GGTAATCCGCTGATGTACAACAAAGAATAC TATATGTTCAATGCTGGTAACAAGAACTCT TACATCAAACTGAAGAAAGACTCTCCGGTT GGTGAAATC CTGACTCGTTC C AAATACAAC 
GGTAATC CGCTGATGTACAAC AAAGAATAC TATATGTTCAATGCTGGTAACAAGAACTCT TACATCAAACTGAAGAAAGACTCTCCGGTT GGTGAAATC CTGACTCGTTCC AAATACAAC 
GGTAATCCGCTGATGTACAACAAAGAATAC TATATGTTCAATGCTGGTAACAAGAACTCT TACATCAAACTGAAGAAAGACTCTCCGGTT GGTGAAATCCTGACTCGTTCCAAATACAAC 
^T^l^^ATG^CAACAAAGAATAC TATATGTTCAATGCTGGTAACAAGAACTCT TACATCAAACTGAAGAAAGACTCTCCGGTT GGTGAAATCCTGACTCGTTCCAAATACAAC 



'*.*.*!*"# *.*.**#:* 



v!j«!HS*SS; Si, . - ^ , - .U jrffeilw. I 



* * * * * * * * * * i 



'*>>>>;*>,*: 



Original_Seq7 
Substitute_Seq7 
Amended_Seq7 
0riginal__Fig4 



841 870 871 900 901 930 931 960 

CAGAACTCTAAATACATCAACTACCGCGAC CTGTACATCGGTGAAAAGTTCATCATCCGT CGCAAATCTAACTCTCAGTCCATCAATGAT GACATCGTACGTAAAGAAGACTACATCTAC 
CAGAACTCTAAATACATCAACTACCGCGAC CTGTACATCGGTGAAAAGTTCATCATCCGT CGCAAATCTAACTCTCAGTCCATCAATGAT GACATCGTACGTAAAGAAGACTACATCTAC 
CAGAACTCTAAATACATCAACTACCGCGAC CTGTACATCGGTGAAAAGTTCATCATCCGT CGCAAATCTAACTCTCAGTCCATCAATGAT GACATCGTACGTAAAGAAGACTACATCTAC 
H^^TCTA^TA^ CTGTACATCGGTGAAAAGTTCATCATCCGT CGCAAATCTAACTCTCAGTCCATCAATGAT GACATCGTACGTAAAGAAGACTACATCTAC 

.* ****-**"******' + '+ + * + + + ii > 'X^.fc^i.'^^.i^--^^^:^ 1 ^^ y^V^^^^v.v^^^s^^.^^^.^^,^ 



Original_Seq7 
Substitute_Seq7 
Amended_Seq7 
Original_Fig4 



961 990 991 1020 1021 1050 1051 1080 

CTGGACTTCTTCAACCTGAATCAGGAATGG CGTGTATACACCTACAAGTACTTCAAGAAA GAAGAAGAAAAGCTTTTCCTGGCTCCGATC TCTGATTCCGACGAACTCTACAACACCATC 
CTGGACTTCTTCAACCTGAATCAGGAATGG CGTGTATACACCTACAAGTACTTCAAGAAA GAAGAAGAAAAGCTTTTCCTGGCTCCGATC TCTGATTCCGACGAACTCTACAACACCATC 
CTGGACTTCTTCAACCTGAATCAGGAATGG CGTGTATACACCTACAAGTACTTCAAGAAA GAAGAAGAAAAGCTTTTCCTGGCTCCGATC TCTGATTCCGACGAACTCTACAACACCATC 
^^£n.^S^?7P^ TC ^99^Vf G 99J9BIACACCTACAAGTACTTCAAGAAA GAAGAAGAAAAGCTTTTCCTGGCTCCGATC TCTGATTCCGACGAACTCTACAACACCATC 



W *•*:*•* *********** 



WW*.**,*,*,******,**.** 

I- .-^..^!if. , 



* ** * * ** * * * 



*,*.*.*.**,*,**. 



0riginal_Seq7 
Substitute_Seq7 
Amended_Seq7 
Original_Fig4 



1081 mi H40 1141 U70 1171 1200 

C AGATC AAAGAATACGACGAACAG C CGAC C TACTCTTGCCAGCTGCTGTTCAAGAAAGAT GAAGAATCTACTGACGAAATCGGTCTGATC GGTATCCACCGTTTCTACGAATCTGGTATC 
CAGATCAAAGAATACGACGAACAGCCGACC TACTCTTGCCAGCTGCTGTTCAAGAAAGAT GAAGAATCTACTGACGAAATCGGTCTGATC GGTATCCACCGTTTCTACGAATCTGGTATC 
CAGATCAAAGAATACGACGAACAGCCGACC TACTCTTGCCAGCTGCTGTTCAAGAAAGAT GAAGAATCTACTGACGAAATCGGTCTGATC GGTATCCACCGTTTCTACGAATCTGGTATC 
^9 A S^^T A ^ GAC S A ^ CAGCCGACC I A ^ C ™^ AG SS£^^^ G ^ GA T GAAGAATCTACTGACGAAATCGGTCTGATC GGTATCCACCGTTTCTACGAATCTGGTATC 

************* v^<«. ^ . ^ * ^^^g^^^^^^^ 1®^SKSHK 



Original_Seq7 
Substitute_Seq7 
Amended_Seq7 
Original_Fig4 



1201 1230 1231 1260 1261 1290 1291 1320 

GTATTCGAAGAATACAAAGACTACTTCTGC ATCTCCAAATGGTACCTGAAGGAAGTTAAA CGCAAACCGTACAACCTGAAACTGGGTTGC AATTGGCAGTTCATCCCGAAAGACGAAGGT 
GTATTCGAAGAATACAAAGACTACTTCTGC ATCTCCAAATGGTACCTGAAGGAAGTTAAA CGCAAACCGTACAACCTGAAACTGGGTTGC AATTGGCAGTTCATCCCGAAAGACGAAGGT 
GTATTCGAAGAATACAAAGACTACTTCTGC ATCTCCAAATGGTACCTGAAGGAAGTTAAA CGCAAACCGTACAACCTGAAACTGGGTTGC AATTGGCAGTTCATCCCGAAAGACGAAGGT 
G l A ^^9^7^9^ Gk ^ hmC?GC ATCTCCAAATGGTACCTGAAGGAAGTTAAA CGCAAACCGTACAACCTGAAACTGGGTTGC AATTGGCAGTTCATCCCGAAAGACGAAGGT 



* ******** 



1321 1341 
0riginal_Seq7 TGGACCGAATAGTAAGAATTC 
SubstitUte_Seq7 TGGACCGAATAGTAAGAATTC 
Amended_Seq7 TGGACCGAATAGTAAGAATTC 
0riginal_Fig4 TGGACCGAATAGTAAGAATTC 

'**'*.* *"*.* * v « ^*^» *'**'* #"*'*' * « 
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ALIGNMENT 2: SEQ ID NO: 37 



1 30 31 60 61 90 91 120 

Original_Seq37 CTCGAGCCATGGCTCGTCTGCTGTCTACCT TCACTGAATACATCAAGAACATCATCAATA CCTCCATCCTGAACCTGCGCTACGAATCCA ATCACCTGATCGACCTGTCTCGCTACGCTT 

Substitute_Seq37 CTCGAGCCATGGCTCGTCTGCTGTCTACCT TCACTGAATACATCAAGAACATCATCAATA CCTCCATCCTGAACCTGCGCTACGAATCCA ATCACCTGATCGACCTGTCTCGCTACGCTT 

Amended_Seq37 CTCGAGCCATGGCTCGTCTGCTGTCTACCT TCACTGAATACATCAAGAACATCATCAATA CCTCCATCCTGAACCTGCGCTACGAATCCA ATCACCTGATCGACCTGTCTCGCTACGCTT 

Fig2_' 975_App ^C<^GCCAJ^CTCGrcrGCrGT<^AC^ TCACTGAATACATCAAGAACATCATCAATA CCTCCATCCTGAACCTGCGCTACGAATCCA ATCACCTGATCGACCTGTCTCGCTACGCTT 



*.*.* **.* ******** 



* *.*,*'*'*>■** 



Original_Seq37 
Substitute_Seq37 
Amended_Seq37 
Fig2_'975_App 



121 ^ 150 151 180 181 210 211 240 

CCAAAATCAACATCGGTTCTAAA|TTAACT TCGATCCGATCGACAAGAATCAGATCCAGC TGTTCAATCTGGAATCTTCCAAAATCGAAG TTATCCTGAAGAATGCTATCGTATACAACT 
CCAAAATCAACATCGGTTCTAAA|TTAACT TCGATCCGATCGACAAGAATCAGATCCAGC TGTTCAATCTGGAATCTTCCAAAATCGAAG TTATCCTGAAGAATGCTATCGTATACAACT 
CCAAAATCAACATCGGTTCTAAA|TTAACT TCGATCCGATCGACAAGAATCAGATCCAGC TGTTCAATCTGGAATCTTCCAAAATCGAAG TTATCCTGAAGAATGCTATCGTATACAACT 
P C r *^J5^ CA JCG<5TTCTAAAgTTAACT TCGATCCGATCGACA AGAATCAGA TCCAGC TGTTCAATCTGGAATCTTCCAAAATCGAAG TTATCCTGAAGAATG CTATC GTATACAACT 



*> *.* 



Original_Seq37 
Substitute_Seq37 
Amende d_Se q 3 1 
Fig2_ 1 975_App 



241 270 271 

CTATGTACGAAAACTTCTCCACCTCCTTCT GGATCCGTATCCCj 
CTATGTACGAAAACTTCTCCACCTCCTTCT GGATCCGTATCCC 
CTATGTACGAAAACTTCTCCACCTCCTTCT GGATCCGTATCCCJ 
CTATGTACGAAAACTTCTCCACCTCCTTCT GGATCCGTATCCCj 
****** * V** ******************* * * * * *V* * * * * 



300 301 330 331 360 

ATACTTCAACTCCA TCTCTCTGAACAATGAATACACCATCATCA ACTGCATGGAAAACAATTCTGGTTGGAAAG 
ATACTTCAACTCCA TCTCTCTGAACAATGAATACACCATCATCA ACTGCATGGAAAACAATTCTGGTTGGAAAG 
ATACTTCAACTCCA TCTCTCTGAACAATGAATACACCATCATCA ACTGCATGGAAAACAATTCTGGTTGGAAAG 
ATACTTCAACTCCA TCTCTCTGAACAATGAATACACCATCATCA ACTGCATGGAAAACAATTCTGGTTGGAAAG 



**************, ,***** 



*** ******** 



********** **** 



Original_Seq37 
Substitute_Seq37 
Amended_Seq37 
Fig2_ ' 975_App 



361 390 391 420 

TATCTCTGAACTACGGTGAAATCATCTGGA CTCTGCAGGACACTCAGGAAATCAAACAGC 
TATCTCTGAACTACGGTGAAATCATCTGGA CTCTGCAGGACACTCAGGAAATCAAACAGC 
TATCTCTGAACTACGGTGAAATCATCTGGA CTCTGCAGGACACTCAGGAAATCAAACAGC 
TATCTCTGAACTACC^TGAAATCATCTGGA CTCTGCAGGACACTCAGGAAATCAAACAGC 



421 450 451 480 

GTGTTGTATTCAAATACTCTCAGATGATCA ACATCTCTGACTACATCAATCGCTGGATCT 
GTGTTGTATTCAAATACTCTCAGATGATCA ACATCTCTGACTACATCAATCGCTGGATCT 
GTGTTGTATTCAAATACTCTCAGATGATCA ACATCTCTGACTACATCAATCGCTGGATCT 
GTfflTGTATTCAAATACTCT^GATGATCA ACATCTCTGACTACATCAATCGCTGGATCT 
+ * * * V* ****** *'* *> .*:***' :* * * * V*^'*^*,*^*';* ♦• : *" ; ***' 



Original_Seq37 
Substitute_Seq37 
Amended_Seq37 
Fig2_ ' 975_App 



4 81 510 511 540 541 570 571 

TCGTTACCATCACCAACAATCGTCTGAATA ACTCCAAAATCTACATCAACGjCCGTCTGA TCGACCAGAAACCGATCTCCAATCTGGGTA ACATCCACGl 
TCGTTACCATCACCAACAATCGTCTGAATA ACTCCAAAATCTACATCAACg|cCGTCTGA TCGACCAGAAACCGATCTCCAATCTGGGTA ACATCCACG 
TCGTTACCATCACCAACAATCGTCTGAATA ACTCCAAAATCTACATCAACGJCCGTCTGA TCGACCAGAAACCGATCTCCAATCTGGGTA ACATCCACG 
TCGTTACCATCACCAACAATCGTCTGAATA ACTCCAAAATCTACATCAACG|CCGTCTGA TCGACCAGAAACCGATCTCCAATCTGGGTA ACATCCAC 



600 

TAATAACATCATGTTCA 
TAATAACATCATGTTCA 
TAATAACATCATGTTCA 
TAATAACATCATGTTCA 



Original_Seq37 
Substitute_Seq37 
Amended_Seq3 7 
Fig2_' 975_App 



601 630 631 660 661 690 691 720 

AACTGGACGGTTGTCGTGACACTCACCGCT ACATCTGGATCAAATACTTCAATCTGTTCG ACAAAGAACTGAACGAAAAAGAAATCAAAG ACCTGTACGACAACCAGTCCAATTCTGGTA 
AACTGGACGGTTGTCGTGAC ACTCA CCGCT ACATCTGGATCAAATACTTCAATCTGTTCG ACAAAGAACTGAACGAAAAAGAAATCAAAG ACCTGTACGACAACCAGTCCAATTCTGGTA 
AACTGGACGGTTGTCGTGACACTCACCGCT ACATCTGGATCAAATACTTCAATCTGTTCG ACAAAGAACTGAACGAAAAAGAAATCAAAG ACCTGTACGACAACCAGTCCAATTCTGGTA 
^ Cn ^9^^9 TCG ^ Ck ^ CACC 9P T . ACATCTC^ TCAAATACTTCAATCTGTTCG ACAAAG AACTGAACGAAAAAGAAATCAAAG ACCTGTACGACAACCAGTCCAATTCTGGTA 

****************************** **** *""*"*"* "* "* * **** *"» > i ^JfJ^ij? i^'j.'j?^. !•— ™--^ ^ .. v - ■ - d-, - ...„„ , .,_ w^^-.-.^ 



Original_Seq37 
Substitute_Seq37 
Amended_Seq37 
Fig2_'975_App 



721 ? 50 751 780 781 810 811 840 

TCCTGAAAGACTTCTGGGGTGACTACCTGC AGTACGACAAACCGTACTACATGCTGAATC TGTACGATCCGAACAAATACGTTGACGTCA ACAATGTAGGTATCCGCGGTTACATGTACC 
TCCTGAAAGACTTCTGGGGTGACTACCTGC AGTACGACAAACCGTACTACATGCTGAATC TGTACGATCCGAACAAATACGTTGACGTCA ACAATGTAGGTATCCGCGGTTACATGTACC 
TCCTGAAAGACTTCTGGGGTGACTACCTGC AGTACGACAAACCGTACTACATGCTGAATC TGTACGATCCGAACAAATACGTTGACGTCA ACAATGTAGGTATCCGCGGTTACATGTACC 
TCCTGAAAGACTTCTGGGGTGACTACCTGC AGTACGACAAACCGTACTACATGCTGAATC TGTACGATCCGAACAAATACGTTGACGTCA ACAATGTAGGTATCCGCGGTTACATGTACC 

********************** * **** * * *' * *^ 4 * iVi^iwIiTii^^ , r y*r^r^?\*:-*\-----.r:---'7^K.-?^™^^ ... • . . — 



Original_Seq37 
Substitute_Seq37 
Amended_Seq37 
Fig2_'975_App 



Original_Seq37 
Substitute_Seq37 
Amended_Seq37 
Fig2_' 975_App 



841 870 871 900 901 930 931 960 

TGAAAGGTCCGCGTGGTTCTGTTATGACTA CCAACATCTACCTGAACTCTTCCCTGTACC GTGGTACCAAATTCATCATCAAGAAATACG CGTCTGGTAACAAGGACAATATfSICGCA 
TGAAAGGTCCGCGTGGTTCTGTTATGA CTA CCAACATCTACCTGAACTCTTCCCTGTACC GTGGTACCAAATTCATCATCAAGAAATACG CGTCTGGTAACAAGGACAATA'I^C^ScGCA 
TGAAAGGTCCGCGTGGTTCTGTTATGACTA C C AACATCTACCTGAACTCTTC C CTGTACC GTGGTACCAAATTCATCATCAAGAAATACG CGTCTGGTAACAAGGACAATAtS^CGCA 
^^^T^GCGTGGTTCTCTTATGACTA CCAACATCTACCTGAACTCTTCCCTGTACC GTGGTACCAAATTCATCATCAAGAAATACG CGTCTGGTAACAAGGACAATATpi|cGCA 



961 990 
ACAATGATCGTGTATACATCAATGTTGTAG 
ACAATGATCGTGTATACATCAATGTTGTAG 
ACAATGATCGTGTATACATCAATGTTGTAG 
ACAATGATCGTGTATACATCAATGTTGTAG 
* * * * V* * *'* * * * *"*"* *********** *"*'* * 



991 1020 1021 1050 1051 1080 

TTAAGAACAAAGAATACCGTCTGGCTACCA ATGCTTCTCAGGCTGGTGTAGAAAAGATCT TGTCTGCTCTG<3AAATCCCGGACGTTGGTA 
TTAAGAACAAAGAATACCGTCTGGCTACCA ATGCTTCTCAC^CTGGTGTAGAAAAGATCT TGTCTGCTCTGGAAATC CCGGACGTTGGTA 
TTAAGAACAAAGAATACCGTCTGGCTACCA ATGCTTCTCAGGCTGGTGTAGAAAAGATCT TGTCTGCTCTGGAAATCCCGGACGTTGGTA 
!^^ A ^ C ^ G ^J ACCGTCTGGCTAC CA ATGCTTCTCAGGCTGGTGTAGAAAAGATCT TGTCTGCTCTGGAAATCCCGGACGTTGGTA 



Original_Seq37 
Substitute_Seq37 
Amended_Seq37 
Fig2_' 975_App 



1081 mo 
ATCTGTCTCAGGTAGTTGTAATGAAATCCA 
ATCTGTCTCAGGTAGTTGTAATGAAATCCA 
ATCTGTCTCAGGTAGTTGTAATGAAATCCA 

* *% *'*V*y*'*:*1» *t^* * *.*.* * * * *'**'• • ^ ,,T •"■ 



1111 1140 
AGAACGACCAGGGTATCACTAACAAATGCA 
AGAACGAC CAGGGTATCACTAAC AAATGC A 
AGAACGACCAGGGTATCACTAACAAATGCA 
AGAACGACCAGGGTATCACTAACAAATGCA 



1141 1170 1171 1200 

AAATGAATCTGCAGGACAACAATGGTAACG ATATCGGTTTCATCGGTTTCCAC C AGTTCA 
AAATGAATCTGCAGGACAACAATGGTAACG ATATCGGTTTCATCGGTTTCCAC C AGTTCA 
AAATGAATCTGCAGGACAACAATGGTAACG ATATCGGTTTCATCGGTTTCCACCAGTTCA 
AAATGAATCTGCAGGACAACAATGGTAACG ATATCGGTTTCATCGGTTTCCACCAGTTCA 



Original_Seq37 
Substitute_Seq37 
Amended_Seq37 
Fig2_- 975_App 



1201 1230 1231 1260 1261 1290 1291 1320 

ACAATATCGCTAAACTGGTTGCTTCCAACT GGTACAATCGTCAGATCGAACGTTCCTCTC GCACTCTGGGTTGCTCTTGGGAGTTCATCC CGGTTC^TGACGGTTGGGGTGAACGTCCGC 
ACAATATCGCTAAACTGGTTGCTTCCAACT GGTACAATCGTCAGATCGAACGTTCCTCTC GCACTCTC^TTGCTCITGGGAGTTCATCC CGGTTGATGACGGTTGGGGTGAACGTCCGC 
ACAATATCGCTAAACTGGTTGCTTCCAACT GGTACAATCGTCAGATCGAACGTTCCTCTC GCACTCTGGGTTGCTCTTGGGAGTTCATCC CGGTTGATGACGGTTGGGGTGAACGTCCGC 
^9^It'E^^^^^ T 79 C ^P^^ r GGTACAATCGTCAGATCGAACGTTCCTCTC GCACTCTGGGTTGCTCTTGGGAGTTCATCC CGGTTGATGACGGTTGGGGTGAACGTCCGC 

****************»#************' i : ■3«2fti?i?ii. i: i' i v j. 1, i. ' -t' jL "I- *7* r 4""*^#'*; 1 '* t * ^ ' ''"p:?' V:*";* ••>-;'v.»*«:".«-;r-»--*"-.—- >• •- < „ — ,, „_„ — .... „„, 



*>*,**'* ** 



* . * ■ *r* *v* ■* v +' v* ■v* < * 



******.** 



Original_Seq37 
Substitute_Seq37 
Amended_Seq37 
Fig2_ ' 975_App 



1321 1338 
TGTAACCCGGGAAAGCTT 
TGTAACCCGGGAAAGCTT 
TGTAACCCGGGAAAGCTT 
TGTAACCCGGGAAAGCTT 
'* *'* * * * * * * *;* J * 1 :* * 
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ALIGNMENT 3; SEQ ID NO: 39 



A33626-A 067252.0107 
PATENT 



Original_Seq3 9 
SubsLitute_Seq39 
Amended_Seq3 9 
Fig4_' 975_App 



1 ; 30 31 60 61 90 91 120 

ATGGC|gCAACAAATACAATTCCGAAATC CTGAACAATATCATCCTGAACCTGCGTTAC AAAGACAACAATCTGATCGATCTGTCTGGT TACGGTGCTAAAGTTGAAGTATACGACGGT 
ATGGCj C AACAAATACAATTC CGAAATC CTGAACAATATCATCCTGAACCTGCGTTAC AAAGACAACAATCTGATCGATCTGTCTGGT TACGGTGCTAAAGTTGAAGTATACGACGGT 
ATGGC^gCAACAAATACAATTCCGAAATC CTGAACAATATCATCCTGAACCTGCGTTAC AAAGACAACAATCTGATCGATCTGTCTGGT TACGGTGCTAAAGTTGAAGTATACGACGGT 
ATGGqgljicAACAAATACAATTCCGAAATC CTGAACAATATCATCCTGAACCTGCGTTAC AAAGACAACAATCTGATCGATCTGTCTGGT TACGGTGCTAAAGTTGAAGTATACGACGGT 



*»,*"*»* 



Original_Seq3 9 
Substitute_Seq3 9 
Amended_Seq3 9 
Fig4_' 975_App 



121 150 151 180 181 210 211 240 

GTTGAACTGAATGACAAGAACCAGTTCAAA CTGACCTCTTCCGCTAACTCTAAGATCCGT GTTACTCAGAATCAGAACATCATCTTCAAC TCCGTATTCCTGGACTTCTCTGTTTCCTTC 
GTTGAACTGAATGACAAGAACCAGTTCAAA CTGACCTCTTCCGCTAACTCTAAGATCCGT GTTACTCAGAATCAGAACATCATCTTCAAC TCCGTATTCCTGGACTTCTCTGTTTCCTTC 
GTTGAACTGAATGACAAGAACCAGTTCAAA CTGACCTCTTCCGCTAACTCTAAGATCCGT GTTACTCAGAATCAGAACATCATCTTCAAC TCCGTATTCCTGGACTTCTCTGTTTCCTTC 
GTTGAACTGAATGACAAGAACCAGTTCAAA CTCACCTCTTCCGCTAACTCTAAGATCCGT GTTACTCAGAATCAGAACATCATCTTCAAC TCCGTATTCCTGGACITCTCTGTTTCCTT 



241 270 271 300 301 330 331 360 

Original_Seq39 TGGATCCGTATCCCGAAATACAAGAACGAC GGTATCCAGAATTACATCCACAATGAATAC ACCATCATCAACTGCATGAAGAATAACTCT GGTTGGAAGATCTCCATCCGCGGTAACCGT 

SubstitUte_Seq39 TGGATCCGTATCCCGAAATACAAGAACGAC GGTATCCAGAATTACATCCACAATGAATAC ACCATCATCAACTGCATGAAGAATAACTCT GGTTGGAAGATCTCCATCCGCGGTAACCGT 

Amended_Seq3 9 TGGATCCGTATCCCGAAATACAAGAACGAC GGTATCCAGAATTACATCCACAATGAATAC ACCATCATCAACTGCATGAAGAATAACTCT GGTTGGAAGATCTCCATCCGCGGTAACCGT 

Fig4_'975_App TGGATCCGTATCCCGAAATACAAGAACGAC GGTATCCAGAATTACATCCACAATGAATAC ACCATCATCAACTGCATGAAGAATAACTCT GGTTGGAAGATCTCCATCCGCGGTAACCGT 



******** * * * 



******** 



* * * * * * ********* 



361 390 391 420 421 450 451 480 

Original_Seq39 ATCATCTGGACTCTGATCGATATCAACGGT AAGACCAAATCTGTATTCTTCGAATACAAC ATCCGTGAAGACATCTCTGAATACATCAAT CGCTGGTTCTTCGTTACCATCACCAATAAC 

Substitute_Seq39 ATCATCTGGACTCTGATCGATATCAACGGT AAGACCAAATCTGTATTCTTCGAATACAAC ATCCGTGAAGACATCTCTGAATACATCAAT CGCTGGTTCTTCGTTACCATCACCAATAAC 

Amended_Seq39 ATCATCTGGACTCTGATCGATATCAACGGT AAGACCAAATCTGTATTCTTCGAATACAAC ATCCGTGAAGACATCTCTGAATACATCAAT CGCTGGTTCTTCGTTACCATCACCAATAAC 

Fig4_'975_App ATCATCTGGACTCTGATCGATATCAACGGT AAGACCAAATCTGTATTCTTCGAATACAAC ATCCGTGAAGACATCTCTGAATACATCAAT CGCTGGTTCTTCGTTACCATCACCAATAAC 



******* * *. 



Original_Seq3 9 
Substitute_Seq39 
Amended_Seq39 
Fig4_' 975_App 



481 510 
CTGAAC AATGCTAAAATCTA CATC AACGGT 
CTGAACAATGCTAAAATCTACATCAACGGT 
CTGAA CAATGCTAAAATCTAC ATCAA CGGT 
CTGAACAATGCTAAAATCTACATCAACGGT 
.*'* * *'*V* *jV* * * * * * * ******** V* 



511 540 541 570 

AAACTGGAATCT AATA CCGACATCAAAGAC ATCCGTGAAGTTATCGCTAACGGTGAAATC 
AAACTGGAATCTAATACCGACATCAAAGAC ATCCGTGAAGTTATCGCTAACGGTGAAATC 
AAACTGGAATCTAATACCGACATCAAAGAC ATCCGTGAAGTTATCGCTAACGGTGAAATC 
AAACTGGMTCTAATACCGA ATCCGTG AAGTTA TCGCTAACGGTGAAATC 



571 600 
ATCTTCAAACTGGACGGTGACATCGATCGT 
ATCTTCAAACTGGACGGTGACATCGATCGT 
ATCTTCAAACTGGACGGTGACATCGATCGT 
ATCTTCAAACTGGACGGTGACATCGATCGT 
*;*.*>:*,£^ 



Original_Seq39 
Substitute_Seq39 
Amended_Seq3 9 
Fig4_' 975_App 



601 630 
ACCCAGTTCATCTGGATGAAATACTTCTCC 
ACCCAGTTCATCTGGATGAAATACTTCTCC 
ACCCAGTTCATCTGGATGAAATACTTCTCC 
ACCCAGTTCATCTGGATGAAATACTTCTCC 



********** 



631 660 661 690 691 720 

ATCTTCAACACCGAACTGTCTCAGTCCAAT ATCGAAGAACGGTACAAGATCCAGTCTTAC TCCGAATACCTGAAAGACTTCTGGGGTAAT 
ATCTTCAACACCGAACTGTCTCAGTCCAAT ATCGAAGAACGGTACAAGATCCAGTCTTAC TCCGAATACCTGAAAGACTTCTGGGGTAAT 
ATCTTCAACACCGAACTGTCTCAGTCCAAT ATCGAAGAACGGTACAAGATCCAGTCTTAC TCCGAATACCTGAAAGACTTCTGGGGTAAT 
^^S^^. C 9^ C ^' rCTCAG ' rccAA ' r ATCGAAGAACGGTACAAGATCCAGTCTTAC TCCGAATACCTGAAAGACTTCTGGGGTAAT 



******* *"* * * * * * * * 



******»"*** 



n . . , „ „ 721 750 751 780 7 81 810 811 840 

Original_Seq39 CCGCTGATGTACAACAAAGAATACTATATG TTCAATG CTGGTAACAAGAACTCTTA CATC AAA CTGAAGAAAGACTCTC CGGTTGGTGAA ffTCCTGACTCGTTCCAAATACAACCAGAAC 

SubstitUte_Seq39 CCGCTGATGTACAACAAAGAATACTATATG TTCAATGCTGGTAACAAGAACTCTTACATC AAACTGAAGAAAGACTCTCCGGTTGGTGAA ItCCTGACTCGTTCCAAATACAACCAGAAC 

Araended_Seq39 CCGCTGATGTACAACAAAGAATACTATATG TTCAATGCTGGTAACAAGAACTCTTACATC AAACTGAAGAAAGACTCTCCGGTTGGTGAA ItCCTGACTCGTTCCAAATACAACCAGAAC 

Fig4_'975_App CCGCTGA^TGTACAACAAAGAATACTATATG TTCAATGCTGGTAACAAGAACTCTTACATC AAACTGAAGAAAGACTCTCCGGTTGGTGAA ItCCTGACTCGTTCCAAATACAACCAGAAC 



*;.*;*;*:*;***.*.**■* 



>***'**.*' 



841 870 871 900 901 

Original_Seq39 TCTAAATACATCAACTACCGCGACCTGTAC ATCGGTGAAAAGTTCATCATCCGTCGCAAA TCTAACTCTCAGTCCATCAATGA 

Substitute_Seq39 TCTAAATACATCAACTACCGCGACCTGTAC ATCGGTGAAAAGTTCATCATCCGTCGCAAA TCTAACTCTCAGTCCATCAAT 

Amended_Seq3 9 TCTAAATACATCAACTACCGCGACCTGTAC ATCGGTGAAAAGTTCATCATCCGTCGCAAA TCTAACTCTCAGTCCATCAAT 

Fig4_'975_App I^AAATACATC^CTACCGCGACCTGTAC ATCGGTGAAAAGTTCATCATCCGTCGCAAA TCTAACTCTCAGTCCATCAAT 



*"*"**,*.■*:*.*"** 



930 931 960 
jjGACATC GTACGTAAAGAAGACTACATCTACCTGGAC 
VCATC GTACGTAAAGAAGACTACATCTACCTGGAC 
VCATC GTACGTAAAGAAGACTACATCTACCTGGAC 
3ACATC GTACGTAAAGAAGACTACATCTACCTGGAC 
*>**>t*T*^*^*"* v ^ 



Original_Seq39 
Substitute_Seq39 
Amended_Seq3 9 
Fig4_* 97S_App 



961 

TTCTTCAAC CTGAATC A< 
TTCTTCAACCTGAATCA* 
TTCTTCAAC CTGAATCA* 



991 1020 1021 1050 1051 1080 

TACACCTACAAGTACTTCAAGAAAGAAGAA GAAAAGCTTTTCCTGGCTCCGATCTCTGAT TCCGACGAACTCTACAACACCATCCAGATC 

TACACCTACAAGTACTTCAAGAAAGAAGAA GAAAAGCTTTTCCTGGCTCCGATCTCTGAT TCCGACGAACTCTACAACACCATCCAGATC 

TACACCTACAAGTACTTCAAGAAAGAAGAA GAAAAGCTTTTCCTGGCTCCGATCTCTGAT TCCGACGAACTCTACAACACCATCCAGATC 

TTCTTCAACCTGAATCAGGAATGGCGTGTA TACACCTACAAGTACTTCAAGAAAGAAGAA GAAAAGCTTTTCCTGGCTCCGATCTCTGAT TCCGACGAACTCTACAACACCATCCAGATC 

* * * * *.* * * * * *.*.** *"* * * j.j.j.j.j.j.j.j.j.^.^ 1 :* \ : -~ •^»-A--~r-.- . im.^^-^™^-^,™,^™,^ . — — 



990 

ATGGCGTGTA 
ATGGCGTGTA 
ATGGCGTGTA 



.A,**;*',*.*'**.*'*.*;*;*'*** 



Original_Seq39 
Substitute_Seq3 9 
Amended_Seq39 
Fig4_ ' 97 5_App 



1081 HI 1 H40 1141 H70 1171 1200 

AAAGAATACGACGAACAGCCGACCTACTCT TGCCAGCTGCTGTTCAAGAAAGATGAAGAA TCTACTGACGAAATCGGTCTGATCGGTATC CACCGTTTCTACGAATCTGGTATCGTATTC 
AAAGAATACGACGAACAGCCGACCTACTCT TGCCAGCTGCTGTTCAAGAAAGATGAAGAA TCTACTGACGAAATCGGTCTGATCGGTATC CACCGTTTCTACGAATCTGGTATCGTATTC 
AAAGAATACGACGAACAGCCGACCTACTCT TGCCAGCTGCTGTTCAAGAAAGATGAAGAA TCTACTGACGAAATCGGTCTGATCGGTATC CACCGTTTCTACGAATCTGGTATCGTATTC 
.Aj^^^ACG^CAGCCGACCTACTCT TGCCAGCTGCTGTTCAAGAAAGATGAAGAA TCTACTGACGAAATCGGTCTGATCGGTATC CACCGTTTCTACGAATCTGGTATCGTATTC 



Original_Seq39 
Substitute_Seq3 9 
Amended_Seq3 9 
Fig4_' 975_App 



12°1 1230 1231 1260 1261 1290 1291 1320 

GAAGAATACAAAGACTjjCTTCTGCATCTCC AAATGGTACCTGAAGGAAGTTAAACGCAAA CCGTACAACCTGAAACTGGGTTGCAATTGG CAGTTCATCCCGAAAGACGAAGGTTGGACC 

GAAGAATACAAAGACTgCTTCTGCATCTCC AAATGGTACCTGAAGGAAGTTAAACGCAAA CCGTACAACCTGAAACTGGGTTGCAATTGG CAGTTCATCCCGAAAGACGAAGGTTGGACC 

GAAGAATACAAAGACTgCTTCTGCATCTCC AAATGGTACCTGAAGGAAGTTAAACGCAAA CCGTACAACCTGAAACTGGGTTGCAATTGG CAGTTCATCCCGAAAGACGAAGGTTGGACC 

^^^I^p^^CT^CTTCTGCATCTCC AAATGGTACCTGAAGGAAGTTAAACGCAAA CCGTACAACCTGAAACTGGGTTGCAATTGG CAGTTCATCCCGAAAGACGAAGGTTGGACC 

* * * A.**** * *■* *'* * ** ******* ****** 7+.*'Sli : +y'^^^'^^^^ ^r^v^^:^*^^^^ _ 



!*>.*.*. ** "*.¥^*~*! S! 



Original_Seq3 9 
Substitute_Seq3 9 
Amended_Seq39 
Fig4_' 975_App 



1321 1350 1351 

GAATAGTAACCTCTAGAGTCGAGGCCTGCA G 
GAATAGTAACCTCTAGAGTCGAGGCCTGCA G 
GAATAGTAACCTCTAGAGTCGAGGCCTGCA G 
GAATAGTAACCTCTAGAGTCGAGGCCTGCA G 



NY02;465989.1 
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A33626-A 067252.0107 
PATENT 



ALIGNMENT 4; SEQ ID NO; 40 



Original_Seq40 
Subetitute_Seq40 
Amende d_Seq4 0 
Whelan M81186 



MPVTINNFNYNDPIDNNNI IMMEPPFARGT GRYYKAFKITDRIWIIPERYTFGYKPEDFN KSSGI FNRDVCE YYDPDYLNTNDKKN I FLQ TMIKLFNRIKSKPLGEKLLEMI INGIPYLG 



Original_Seq4 0 
Subetitute_Seq40 
Amended_Seq40 
Whelan M81186 



DRRVPLEEFNTNIASVTVNKLISNPGEVER KKG I FANL I I FGPGPVLNENET I D I G I QNH FASREGFGG I MQMKFCPE YVSVFNNVQENK GASIFNRRGYFSDPALILMHELIHVLHGLY 



Original_Seq40 
Substitute_Seq40 
Amended_Seq40 
Whelan M81186 



GIKVDDLPIVPNEKKFFMQSTDAIQAEELY TFGGQDPSI ITPSTDKSIYDKVLQNFRGIV DRLNKVLVC I SDPN I N I NI YKNKFKDKYKF VEDSEGKYSIDVESFDKLYKSLMFGFTETN 



Original_Seq40 
Substitute_Seq40 
Amended_Seq4 0 
whelan M81186 



IAENYKIKTRASYFSDSLPPVKIKNLLDNE IYTIEEGFNISDKDMEKEYRGQNKAINKQA YEEISKEHLAVYKIQMCKSVKAPGICIDVD NEDLFFIADKNSFSDDLSKNERIEYNTQSN 



Original_Seq40 
Substitute_Seq40 
Amended_Seq4 0 
Whelan M81186 



YIENDFPINELILDTDLISKIELPSENTES LTDFNVDVPVYEKQPA I KK I FTDENT I FQ Y LYSQTFPLDIRDISLTSSFDDALLFSNKVY S FFSMDY IKTANKWEAGLFAGWVKQ I VND 



Original_Seq40 
Substitute_Seq40 
Amended_Seq40 
Whelan M81186 



FVIEANKSNTMDKIADISLIVPYIGLALNV GNETAKGNFENAFE I AGAS I LLEF I PELL I PWGAFLLES Y I DNKNK I I KT I DNALTKRN EKWSDMYGLIVAQWLSTVNTQFYTIKEGMY 



Original_Seq40 
Substitute_Seq40 
Amended Seq4 0 
Whelan_M81186 



KALNYQAQALEEI IKYRYNI YSEKEKSNIN I D FND I NS KLNEG I NQA I DN I NNF I NGCSV SYLMKKM I PLAVEKLLDFDNTLKKNLLNYI DENKLYL I GSAE YEKS KVNKYLKT I M P FDL 



Original_Seq40 
Substitute_Seq40 
Amended_Seq4 0 
Whelan M81186 



841 870 871 900 901 930 931 960 

FNKYNSEILNNI ILNLRY KDNNL I DLSGYGAKVEVYDGVELNDKNQFK LTS S ANS K I RVTQNQN 1 1 FNSVFLDFSVSF W I R I PKYKNDG I QNY I HNE YT 1 I NCM KNNS 

FNKYNSEILNNI ILNLRY KDNNL I DLSGYGAKVEVYDGVELNDKNQFK LTS SANS K I RVTQNQN I I FNSVFLDFSVSF WIRIPKYKNDGIQNYIHNEYTI I NCM KNNS 

FNKYNSEILNNI ILNLRY KDNNL I DLSGYGAKVEVYDGVELNDKNQFK LTS S ANS K I RVTQNQN I I FNSVFLDFSVSF WIRIPKYKNDGIQNYIHNEYTI I NCM KNNS 

SIYTNDTILIEMFNKYNSEILNNI ILNLRY Kp^JDLSGYGAKVEWDGVELNDKNQFK LTS SANS K I RVJQNQNI I FNSVFLDFSVSF WIRIPKYKNDGIQNYIHNEYTI INCMKNNS 



Original_Seq40 
Subatitute_Seq40 
Amended_Seq40 
Whelan M81186 



961 990 
GWKI S I RGNR l|WTLI D I NGKTKSVFFE YN 
GWKISI RGNR I |WTL I D I NGKTKSVF FE YN 
GWKI S I RGNR iMwTL I D I NGKTKS V F FE YN 
GWKI SI RGNR I §WTL I D I NGKTKSVF F E YN 
!*;^;*>TSSS;4?S!*' ■***> *'* * * : * * * * * * * 



991 1020 1021 1050 1051 1080 

IREDISEYINRWFFVTITNNLNNAKIYING KLESNTD I KD I REV I ANGE I I FKLDGD I DR TQFIWMKYFSIFNTELSQSNIEERYKIQSY 
IREDISEYINRWFFVTITNNLNNAKIYING KLESNTD I KD I REV I ANGE I I FKLDGD I DR TQFIWMKYFSIFNTELSQSNIEERYKIQSY 
IREDISEYINRWFFVTITNNLNNAKIYING KLESNTD I KD I REV I ANGE I I FKLDGD I DR TQFIWMKYFSIFNTELSQSNIEERYKIQSY 
IREDIS E Y I NRWFFVT I TNNLNNAK IYING KLESNTD I KD I REV I ANGE I I FKLDGD I DR TQFIWMKYFSIFNTELSQSNIEERYKIQSY 



*.**.* * * * * ** V* 



Original_Seq40 
Subatitute_Seq40 
Amended_Seq40 
Whelan M81186 



1081 lHO 1111 H40 1141 1170 1171 1200 

SEYLKDFWGNPLMYNKEYYMFNAGNKNSYI KLKKDSPVGEILTRSKYNQNSKYINYRDLY I GEKF I I RRKSNSQS I NDD I VRKED Y I YLD FFNLNQE|RVYTYKgFKKEEEKLFLAPISD 
SEYLKDFWGNPLMYNKEYYMFNAGNKNSY I KLKKDSPVGEILTRSKYNQNSKYINYRDLY I GEKF I I RRKSNSQS I NDD I VRKEDY I YLD FFNLNQE&VYTYkIfKKEEEKLFLAPISD 
SEYLKDFWGNPLMYNKEYYMFNAGNKNSYI KLKKDSPVGEILTRSKYNQNSKYINYRDLY IGEKFI I RRKSNSQS INDD I VRKED Y I YLD FFNI*NQEPVYTYk1fKKEEEKLFLAPISD 
£^J^™?£™J^™ FNAGNKNS Y I KLKKDSPVGEILTRSKYNQNSKYINYRDLY IGEKFI I RRKSNSQS INDD I VRKED Y I YLD F FNLNQ E^VYTYK^ FJCKE^EKL FJi^P I SD 



Original_Seq4 0 
Substitute_Seq4 0 
Amended_Seq4 0 
Whelan M81186 



1201 1230 1231 1260 1261 1290 1291 

SDEFYNTIQIKEYDEQPTYSCQLLFKKDEE STDEIGLIGIHRFYESGIVFEEYKDYFCIS KWYLKEVKRKPYNLKLGCNWQFIPKDEGWT E 
SDEFYNTIQIKEYDEQPTYSCQLLFKKDEE STDEIGLIGIHRFYESGIVFEEYKDYFCIS KWYLKEVKRKPYNLKLGCNWQFIPKDEGWT E 
SDEFYNTIQIKEYDEQPTYSCQLLFKKDEE STDEIGLIGIHRFYESGIVFEEYKDYFCIS KWYLKEVKRKPYNLKLGCNWQFIPKDEGWT E 
SDEFYNTIQIKEYDEQPTYSCQLLFKKDEE STDEIGLIGIHRFYESGIVFEEYKDYFCIS KWYLKEVKRKPYNLKLGCNWQFIPKDEGWT E 
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ALIGNMENT 5; SEQ ID NO; 41 



A33626-A 067252.0107 
PATENT 



Original_Seq41 
Substitute_Seq41 
Amended_Seq41 
Thomp s on_X 5 2 0 6 6 



MQFVNKQFNYKDPVNGVD I AY I K I PNVGQM QPVKAFKI HNK I WV I PERDTFTNPEEGDLN PPPEAKQVPVSYYDSTYLSTDNEKDNYLKG VTKLFERIYSTDLGRMLLTSIVRGIPFWGG 



Original_Seq41 
Substitute_Seq41 
Amende d_Seq4 1 
Thomp s on_X 52066 



STIDTELKVIDTNCINVIQPDGSYRSEELN LVI IGPSADI IQFECKSFGHEVLNLTRNGY GSTQ Y I RFS PDFTFGFEESLEVDTNPLLGA GKFATDPAVTLAHEL IHAGHRLYG I A I NPN 



Original_Seq41 
Substitute_Seq41 
Amended_Seq4 1 
Thomp s on_X 52066 



RVFKVNTNAYYEMSGLEVS FEELRTFGGHD AKFIDSLQENEFRLYYYNKFKDIASTLNKA KS I VGTTASLQ YM KNV FKE KYLLS EDTSGK FS VDKLKFDKLYKMLTE I YTEDNFVKFFKV 



0riginal_Seq4 1 
Substitute_Seq41 
Amended_Seq4 l 
Thompson_X5 2066 



361 390 391 420 421 450 451 480 

" " - AL NDLCIKVNNWDLFFSPSEDNFTNDLNKGEE 

" - - AL NDLCIKVNNWDLFFSPSEDNFTNDLNKGEE 

- AL NDLCIKVNNWDLFFSPSEDNFTNDLNKGEE 

LNRKTYLNFDKAVFK I N I VPKVNYT I YDGF NLRNTNLAANFNGQNTE INNMNFTKLKNFT GLFEFYKLLCVRGI ITSKTKSLDKGYNKAL NDLCIKVNNWDLFFSPSEDNFTNDLNKGEE 



Original_Seq41 
Substitute_Seq41 
Amended_Seq41 
Thomp s on_X 52066 



481 510 511 540 541 570 571 600 

I TSDTN I EAAEEN I SLDL I QQY YLTFNFDN E PENI S I ENLSSD I I GQLELM PN I ER F PNG KKYELDKYTMFHYLRAQEFEHGKSRIALTN SVNEALLNPSRVYTFFSSDYVKKVNKATEA 

I TSDTN I EAAE EN I SLDL I QQ YYLTFNFDN EPENISIENLSSDI IGQLELMPNIERFPNG KKYELDKYTMFHYLRAQEFEHGKSRIALTN SVNEALLNPSRVYTFFSSDYVKKVNKATEA 

ITSDTNIEAAEENISLDLIQQYYLTFNFDN EPENISIENLSSDI IGQLELMPNIERFPNG KKYELDKYTMFHYLRAQEFEHGKSRIALTN SVNEALLNPSRVYTFFSSDYVKKVNKATEA 

ITSpTNIE^EHnS^LDLIQQYYLTFNFDN EPENISIENLSSDI IGQLELMPNIERFPNG KKYELDKYTMFHYLRAQEFEHGKSRIALTN SVNEALLNPSRVYTFFSSDYVKKVNKATEA 

> :*.* ***** .*+. *.*.:* * * * *** *";* * * * * * **.* * * * • * * * * * * ** •**•** * * * *7* *> *.*>y*~* * * *. *.*>"* *7*"* r * f ** ttT^**** 'iViv*-*^ 



Original_Seq41 
Substitute_Seq41 
Amended_Seq4 1 
Thompson_X5 2 0 6 6 



601 630 
AMFLGWVEQLVYDFTDETSEVSTTDKIADI 
AMFLGWVEQLVYDFTDETSEVSTTDK I AD I 
AMFLGWVEQLVYDFTDETS EVSTTDK I AD I 
AMFLGWVEQLVYDFTDETSEVSTTDKIADI 



***.***** 



* * * * * * 



631 660 661 690 691 720 

TIIIPYIGPALNIGImLYKDDFVGALIFSG AVILLEFIPEIAIPVLGTFALVSYIANKVL tvqtidnalskrnekwdevykyivtnwlak 

TIIIPYIGPALNIGJiLYKDDFVGALIFSG AVILLEFIPEIAIPVLGTFALVSYIANKVL TVQTIDNALSKRNEKWDEVYKYIVTNWLAK 

TI IIPYIGPALNIGgMLYKDDFVGALIFSG AVILLEFIPEIAIPVLGTFALVSYIANKVL TVQTIDNALSKRNEKWDEVYKYIVTNWLAK 

TI 1 1 PY 19^^H^ r L Ti!^.^.^i££HH ^YJ. L ^ E £H E 1 AI PVLGTFALVS Y I ANKVL TVQT I DNALSKRNEKWDEVYKYI VTNWLAK 

*'*'* * * * * *. ;*> * *> .*iC*^*::*:*\*\*^ * * * **>.'** *'*T 



0riginal_Seq41 
Substitute_Seq41 
Amende d_Seq41 
Thompson_X52066 



721 750 751 780 781 810 811 

VNTQ I DL I RKKMKEALENQAEATKA I I NYQ YNQYTEEEKNNINFNIDDLSSKLNESINKA MINI NKFLNQCS VS YLMNSM I PYGVKRLED FDASLKDALLKY l| 

VNTQ I DL I RKKMKEAL ENQAEATKA I I NYQ YNQYTEEEKNNINFNIDDLSSKLNESINKA M INI NKFLNQCSVSYLMNSM I PYGVKRLED FDASLKDALLKY if 

VNTQ I DL I RKKMKEAL ENQAEATKA I I NYQ YNQYTEEEKNNINFNIDDLSSKLNESINKA MININKFLNQCSVS YLMNSM I PYGVKRLED FDASLKDALLKY Ip 

Y?T.9.f P L * RKKMKEALENQAEATKA 1 1 NYQ YNQYTEEEKNNINFNIDDLSSKLNESINKA MININKFLNQCSVS YLMNSM I PYGVKRLED FDASLKDALLKY l| 



1 ■ fl-viJi-^i.i; 



* ******* *,*.*.*:*.*.*.'* **** 



*.*,.*' *;+:*.* * ** 



*.*„*>.*.■*.*.* **.iTSlit 



840 

3N|GTL I GQVDRLKDK 
3N|GTL I GQVDRLKDK 
3n|gTLI GQVDRLKDK 
3N§GTL I GQVDRLKDK 



Original_Seq41 
Substitute_Seq41 
Amended_Seq4 1 
Thompson_X52066 



841 870 
VNNTLSTD I P FQLS KYVDNQRLLSTFTE Y I 
VNNTLSTD I P FQLS KYVDNQRLLSTFTE Y I 
VNNTLSTD I PFQLS KYVDNQRLLSTFTE Y I 
VNNTLSTD I PFQLSKYVDNQRLLSTFTEY I 



« * * . . i'i' ,*; * * ■ V W * * 



871 900 901 930 931 960 

KNI INTSILNLRYESNHLIDLSRYASKINI GSKVNFDPIDKNQIQLFNLESSKIEVILKN AIVYNSMYENFSTSFWIRIPKYFNSISLNN 

KNI INTSILNLRYESNHLIDLSRYASKINI GSKVNFDPIDKNQIQLFNLESSKIEVILKN AIVYNSMYENFSTSFWIRIPKYFNSISLNN 

KNI INTSILNLRYESNHLIDLSRYASKINI GSKVNFDPIDKNQIQLFNLESSKIEVILKN AIVYNSMYENFSTSFWIRIPKYFNSISLNN 

KNI INTSILNLRYESNHLIDLSRYASKINI GSKVNFDPIDKNQIQLFNLESSKIEVILKN AIVYNSMYENFSTSFWIRIPKYFNSISLNN 



Original_Seq41 
Substitute_Seq4l 
Amended__Seq41 
Thompson_X5 2066 



961 990 991 1020 1021 

EYTI INCMENNSGWKVSLNYGEI IWTLQDT QE I KQRWFKYSQM IN I SDY I NRW I FVTIT NNRLNNSKIY 
EYTI INCMENNSGWKVSLNYGEI IWTLQDT QE I KQRWFKYSQM IN I S D Y I NRW I FVT IT NNRLNNSKIY 
EYTI INCMENNSGWKVSLNYGEI IWTLQDT QE I KQRWFKYSQM IN I SDY I NRW I FVTIT NNRLNNSKIY 
^^U^^^39^ SU3YGEIlwrL ^ D,r QE I KQRWFKYSQM INI SDY INRW I FVTIT NNRLNNSKIY 



* * * * * * * * * 



1050 1051 1080 

i i ngrl i dqkp i snlgn i ha snn i mfkldgcrdthry i w i kyfnlfdkel 
j i ngrl i dq kp i snlgn i ha snnim fkldgcrdthr y i w i kyfnl fd kel 
jingrlidqkpisnlgniha snnimfkldgcrdthryiwi kyfnlfdkel 
jlj^rlipqk^i^^niha snn i mfkldgcrdthry i wi kyfnlfdkel 
"* * * * *>^t*^* **^ *^ : * : *;*/'^ 



Original_Seq41 
Substitute_Seq41 
Amended_Seq4 1 
Thomp s on_X 52066 



1081 mo 1111 

NEKEIKDLYDNQSNSGILKDFWGDYLQYDK PYYMl 
NEKEIKDLYDNQSNSGILKDFWGDYLQYDK PYYM| 
NEKEIKDLYDNQSNSGILKDFWGDYLQYDK PYYM| 
NEKE I KDLYDNQSNSG I LKD FWGDYLQYDK P YYmI 



* v *"i,*^*:V*' s *>^ 



1140 

LYDPNKYVDVNNVG I RGYM YLKGP 
lLYDPNKYVDVNNVG I RGYM YLKGP 
L YD PNKYVDVNNVG I RG YM YLKGP 
LYDPNKYVDVNNVG I RGYM YLKGP 



1141 1170 1171 1200 

RGSVMTTNI YLNSSLYRGTKFI IKK§ASGN KDNIVRNNDRVYINWVKNKEYRLATNASQ 
RGSVMTTN I YLNSSLYRGTKF I I KKMASGN KDNIVRNNDRVYINWVKNKEYRLATNASQ 
RGSVMTTNI YLNSSLYRGTKFI I KkIaSGN KDNIVRNNDRVYINWVKNKEYRLATNASQ 
^^X^~££5£^£O.L K 5&£?J P N I VRNNDRVYIJ^VVKNKEYRLATNASQ 



1201 1230 1231 1260 1261 

Original_Seq41 AGVEK I LSALE I PDVGNLSQVWMKSKNDQ GI TNKCKMNLQDNNGND I GF I GFHQ FNN I A KLVASNWYNI 

Substitute_Seq41 AGVEK I LSALE I PDVGNLSQVWMKSKNDQ G I TNKCKMNLQDNNGND IGF I GFHQFNN I A KLVASNWYNS 

Amended_Seq41 AGVEKI LSALE I PDVGNLSQVWMKSKNDQ G I TNKCKMNLQDNNGND IGF I GFHQFNN I A KLVASNWYNS 

ThompSon_X52066 AGVEK I LSALE IPDVGNLSQWVMKSKNDQ G I TNKCKMNLQDNNGND IGF IGFHQFNN I A KLVASNWYNg 



* :*:*.*.*;*,*. 



1290 1297 

lERSSRTLGCSWEFIPVDD GWGERPL 

SeRSSRTLGCSWEFIPVDD GWGERPL 

jlERSSRTLGCSWEFIPVDD GWGERPL 

SeRSSRTLGCSWEFIPVDD GWGERPL 

^^^i^j^y^ ^*^^*J*"* ~ ^ *T* •*"* 
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A33626-A 067252.0107 
PATENT 



ALIGNMENT 6; SEQ ID NO; 42 



Original_Seq42 
Substitute_Seq42 
Amended_Seq42 
Whelan M81186 



MPVTINNFNYNDPIDNNNI IHMEPPFARGT GRYYKAFKITDRIWI I PERYTFGYKPEDFN KSSGIFNRDVCEYYDPDYLNTNDKKNIFLQ TMIKLFNRIKSKPLGEKLLEMI INGI PYLG 



Original_Seq42 
Substitute_Seq42 
Amended_Seq42 
Whelan M81186 



DRRVPLEEFNTNIASVTVNKLISNPGEVER KKGIFANLI I FGPGPVLNENETIDIGIQNH FASREGFGGIMQMKFCPEYVSVFNNVQENK GAS IFNRRGYFSDPALILMHELIHVLHGLY 



Original_Seq42 
Substitute_Seq42 
Amende d_Seq4 2 
Whelan M81186 



GI KVDDLP I VPNEKKFFMQSTDA I QAEELY TFGGQDPSI ITPSTDKS IYDKVLQNFRGIV DRLNKVLVC I SDPNI N INI YKNKFKDK YKF VEDS EGKYS I DVE S FDKLYKSLM FGFTETN 



Original_Seq42 
Substitute_Seq42 
Amended_Seq42 
Whelan M81186 



361 390 391 420 421 450 451 480 

- APGICIDVD NEDLFFIADKNSFSDDLSKNERIEYNTQSN 

- - - - -APGICIDVD NEDLFFIADKNSFSDDLSKNERIEYNTQSN 

-- APGICIDVD NEDLFFIADKNSFSDDLSKNERIEYNTQSN 

IAENYKIKTRASYFSDSLPPVKIKNLLDNE IYTIEEGFNISDKDMEKEYRGQNKAINKQA YEEISKEHLAVYKIQMCKSVKAPGICIDVD NEDLFFIADKNSFSDDLSKNERIEYNTQSN 

*..* * * * * * * * * ******** *,*.* * : * *^">;****:* . V*^*"i**> V*V 



Original_Seq42 
Substitute_Seq42 
Amended_Seq4 2 
Whelan M81186 



4 81 510 Sll 540 541 570 571 600 

YIENDFPINELILDTDLISKIELPSENTES LTDFNVDVPVYEKQPAIKKI FTDENTIFQY LYSQTF PLD I RD I S LTS S FDDALL FSNKVY SFFSMDYIKTANKWEAGLFAGWVKQIVND 

YIENDFPINELILDTDLISKIELPSENTES LTDFNVDVPVYEKQPAIKKI FTDENTIFQY LYSQTF PLD I RD I S LTS S FDDALLFSNKVY SFFSMDYIKTANKWEAGLFAGWVKQIVND 

YIENDFPINELILDTDLISKIELPSENTES LTDFNVDVPVYEKQPAIKKI FTDENTIFQY LYSQTF PLD I RD IS LTS S FDDALLFSNKVY SFFSMDYIKTANKWEAGLFAGWVKQIVND 

I I H?F P I?E.^I L PTE L 1 : SKI ELP J E .^JS LTD FNVD V PVYEKQPA I KKI FTDENT I FQY LYSQTFPLDI RD I SLTSS FDDALLFSNKVY SFFSMDY I KTANKWEAGLFAGWVKQ I VND 

*****. *.*.***!**,*. * * * * * * * * * * ****** * * ***"*+-+** V* *~ + V* +~* ■i^+^wT*. JVST i"* tT ! *^i*i1i. M r**'i*i^i^^^ ^rr-jr^rr >r r.rr ?w y^^;^r^r':^:^ , r^-r ; wwirw««ser»-j« ww^r* 



0riginal_Seq42 
Substitute_Seq42 
Araended_Seq42 
Whelan M81186 



601 630 631 660 661 690 691 720 

FV I EANKSNTMDK I AD I SL I VP Y IGLALNV GNETAKGNFENAFE I AGAS I LLEF I PELL I PWGAFLLE S Y I DNKNK I I KT I DNALTKRN EKWSDMYGLIVAQWLSTVNTQFYTIKEGMY 
FVIEANKSNTMDKIADISLIVPYIGLALNV GNETAKGNFENAFE I AGAS I LLEF I PELL I PWGAFLLESYIDNKNKI IKTIDNALTKRN EKWSDMYGLIVAQWLSTVNTQFYTIKEGMY 
FV I EANKSNTMDK IAD I SL I VPY IGLALNV GNETAKGNFENAFE I AGAS I LLEF I PELL I PWGAFLLESYIDNKNKI IKTIDNALTKRN EKWSDMYGLIVAQWLSTVNTQFYTIKEGMY 
HJEl^}!™!^^JL^l v ?l I .9ii ALNV GNETAKGNFENAFE I AGAS I LLEF I PELL I PWGAFLLESY IDNKNK 1 1 KTI DNALTKRN EKWSDMYGL IVAQWLSTVNTQFYTI KEGMY 



** ********* 



Original_Seq42 
Substitute_Seq42 
Amended_Seq42 
Whelan M81186 



721 750 
KALNYQAQALEEI IKYRYNI YSEKEKSNIN 
KALNYQAQALEEI IKYRYNIYSEKEKSNIN 
KALNYQAQALEEI IKYRYNI YSEKEKSNIN 
KALNYQAQALEE I I KYRYN IYSEKEKSNIN 



751 780 781 810 811 840 

IDFNDINSKLNEGINQAIDNINNFINGCSV SYLMKKMI PLAVEKLLDFDNTLKKNLLNYI DENKLYLIGSAEYEKSKVNKYLKTIMPFDL 

I DFND I NSKLNEG INQA I DN INNF I NGCSV SYLMKKMI PLAVEKLLDFDNTLKKNLLNYI DENKLYLIGSAEYEKSKVNKYLKTIMPFDL 

IDFNDINSKLNEGINQAIDNINNFINGCSV SYLMKKMI PLAVEKLLDFDNTLKKNLLNYI DENKLYLIGSAEYEKSKVNKYLKTIMPFDL 

IpFNpiNSKLNEGINQAIDNINNFINGCSV SYLMKKM I PLAV EKLLDFDNTLKKNLLNY I DENKLYLIGSAEYEKSKVNKYLKTIMPFDL 



****** *.* * *. *.* * 



* *"* « **"**.+"** 



Original_Seq42 
Substitut e_S eq4 2 
Amended_Seq42 
Whelan M81186 



841 870 871 900 901 

SIYTNDTILIEMFNKYNSEILNNI ILNLRY KDNNL I DLSGYGAKVEVYDGVELNDKNQFK LTSSANSKIR 
SIYTNDTILIEMFNKYNSEILNNI ILNLRY KDNNL I DLSGYGAKVEVYDGVELNDKNQFK LTSSANSKIR 
SIYTNDTILIEMFNKYNSEILNNI ILNLRY KDNNL I DLSGYGAKVEVYDGVELNDKNQFK LTSSANSKIR 
S I YTODT I L I EM F£KYNS E I LNN 1 1 LNLRY KDNNL I DLSGYGAKVEVYDGVELNDKNQ FK LTSSANSKIR 



930 931 960 
TQNQNI I FNSVFLDFSVS F W I R I PKYKNDG IQNY IHNE YT 1 1 NCMKNNS 
TQNQNI I FNSVFLDFSVS F W I R I PKYKNDG I QNY I HNE YT I I NCMKNNS 
TQNQN I I FNSVFLDFSVS F W I R I PKYKNDG I QNY I HNE YT I I NCM KNNS 
TQNWIIFNSVFLDFSVSF W I R I PKYKNDGI QNY I HNE YTI INCMKNNS 
w ^^*'*"* *>>'*.* * 



Original_Seq42 
Substitute_Seq42 
Amended_Seq4 2 
Whelan M81186 



961 990 991 1020 1021 1050 1051 1080 

GWKISIRGpRI IWTLIDINGKTKSVFFEYN IREDISEYINRWFFVTITNNLNNAKIYING KLESNTDIKDIREVIANGEI IFKLDGDIDR TQFIWMKYFSIFNTELSQSNIEERYKIQSY 
GWKISIRGJRIIWTLIDINGKTKSVFFEYN IREDISEYINRWFFVTITNNLNNAKIYING KLESNTDIKDIREVIANGEI IFKLDGDIDR TQFIWMKYFSIFNTELSQSNIEERYKIQSY 
GWKISIRGNRIIWTLIDINGKTKSVFFEYN IREDISEYINRWFFVTITNNLNNAKIYING KLESNTDIKDIREVIANGEI IFKLDGDIDR TQFIWMKYFSIFNTELSQSNIEERYKIQSY 
G J1 K J. S l^^ 1 !^}. D 1 NGKTKSVFFEYN I RED I SEY I NRWFFVT ITNNLNNAKI Y I NG KLESNTD I KD I REVI ANGE 1 1 FKLDGDI DR TQFI WMKYFS I FNTELSQSNI EERYKIQSY 



*********** * * 



* * ******.*** 



Original_Seq42 
Substitute_Seq42 
Amended_Seq42 
Whelan M81186 



1081 mo 
seylkdfwgnplmynkeyymfnagnknsyg 
seylkd fwgnplm ynke yym fnagnkns y§ 
seylkdfwgnplmynkeyymfnagnknsyI 
s e ylkd fwgnplm ynke yym fnagnkns y§ 



HH 1140 1141 1170 1171 1200 

I KLKKDS PVGE I LTRS KYNQNS KY I NYRDL YIGEKFI IRRKSNSQSINDDIVRKEDYIYL DFFNLNQEWRVYTYKYFKKEEE|LFLAPIS 
I KLKKDS PVGE I LTRS KYNQNS KY I NYRDL YIGEKFI IRRKSNSQSINDDIVRKEDYIYL DFFNLNQEWRVYTYKYFKKEEeIlFLAPIS 
I KLKKDS PVGE I LTRS KYNQNS KYI NYRDL YIGEKFI IRRKSNSQSINDDIVRKEDYIYL DFFNLNQEWRVYTYKYFKKEEeIlFLAPIS 

I ^^^y^J} J J^3^2^yj^^ 1 ' l^HH^£^BB^Rn RKEDYIY1 ' PFFNLNQEWRvyTYKYFKKEEEiLFLAPi S 

""""Igjig^^ *f*^**#^#;+S5^ *°*:*V;***T* r 



Original_Seq42 
Substitute_Seq42 
Amended_Seq42 
Whelan_M81186 



1201 1230 1231 

DSDEFYNTIQIKEYDEQPTYSCQLLFKKDE ESTDEIGLIGIHRFYESGIVFE1 

DSDEFYNTIQIKEYDEQPTYSCQLLFKKDE ESTDEIGLIGIHRFYESGIVFE] 

DSDEFYNTIQIKEYDEQPTYSCQLLFKKDE ESTDEIGLIGIHRFYESGIVFE) 

DSDEFYNTIQIKEYDEQPTYSCQLLFKKDE ESTDEIGLIGIHRFYESGIVFE] 



* ** * * ** 




[WYLsEVKRKPYNLKLGCNWQF I PKDEGW TE 
I YLKEVKRKP YNLKLGCNWQF I PKDEGW TE 
'YLKEVKRKPYNLKLGCNWQFI PKDEGW TE 
'YLgEVKRKPYNLKLGCNWQF I PKDEGW TE 



t * * * * * * * 
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Dead 






40381 
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Dead 
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Dead 
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Dead 
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/organism= "Clostridium botulinum" 
/mol_type="genomic DNA" 
/strain="sub sp. type A, NCTC2916" 
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IPERDTFTNPEEGDLNPPPEAKQVPVSYYDSTYLSTDNEKDNYLKGVTKLFERI YSTD 
LGRMLLTSIVRGI PFWGGSTIDTELKVIDTNCINVIQPDGSYRSEELNLVI IGPSADI 
IQFECKSFGHEVLNLTRNGYGSTQYIRFSPDFTFGFEESLEVDTNPLLGAGKFATDPA 
VTLAHELIHAGHRLYGIAINPNRVFKVNTNAYYEMSGLEVSFEELRTFGGHDAKFIDS 
LQENEFRLYYYNKFKDIASTLNKAKSIVGTTASLQYMKNVFKEKYLLSEDTSGKFSVD 
KLKFDKLYKMLTEI YTEDNFVKFFKVLNRKTYLNFDKAVFKINIVPKVNYTIYDGFNL 
RNTNLAANFNGQNTEINNMNFTKLKNFTGLFEFYKLLCVRGI ITSKTKSLDKGYNKAL 
NDLCIKVNNWDLFFSPSEDNFTNDLNKGEEITSDTNIEAAEENISLDLIQQYYLTFNF 
DNEPENISIENLSSDI IGQLELMPNIERFPNGKKYELDKYTMFHYLRAQEFEHGKSRI 
ALTNSVNEALLNPSRVYTFFSSDYVKKVNKATEAAMFLGWVEQLVYDFTDETSEVSTT 
DKIADITI I IPYIGPALNIGNMLYKDDFVGALIFSGAVILLEFI PEIAI PVLGTFALV 
SYIANKVLTVQTIDNALSKRNEKWDEVYKYIVTNWLAKVNTQIDLIRKKMKEALENQA 
EATKAI INYQYNQYTEEEKNNINFNIDDLSSKLNESINKAMININKFLNQCSVS YLMN 
SMI PYGVKRLEDFDASLKDALLKYI YDNRGTLIGQVDRLKDKVNNTLSTDIPFQLSKY 
VDNQRLLSTFTEYIKNI INTSILNLRYESNHLIDLSRYASKINIGSKVNFDPIDKNQI 
QLFNLESSKIEVILKNAI VYNSMYENFSTSFWIRI PKYFNSI SLNNEYTI INCMENNS 
GWKVSLNYGEI IWTLQDTQEIKQRVVFKYSQMINISDYINRWIFVTITNNRLNNSKI Y 
INGRLIDQKPISNLGNIHASNNIMFKLDGCRDTHRYIWIKYFNLFDKELNEKEIKDLY 
DNQSNSGILKDFWGDYLQYDKPYYMLNLYDPNKYVDVNNVGIRGYMYLKGPRGSVMTT 
NI YLNSSLYRGTKFI IKKYASGNKDNIVRNNDRVYINVVVKNKEYRLATNASQAGVEK 
ILSALEIPDVGNLSQVVVMKSKNDQGITNKCKMNLQDNNGNDIGFIGFHQFNNIAKLV 
ASNWYNRQIERSSRTLGCSWEFIPVDDGWGERPL" 

misc_f eature 4042. .4087 

/note="dyad symmetry n 

misc_f eature 4104. .4136 

/note="dyad symmetry" 

misc_f eature 4158. .4188 

/note="dyad symmetry" 

ORIGIN 

1 tcaaagtatt tgtatttatg gtcatttaaa taattaataa 
61 ataagaggtg ttaaatatgc aatttgttaa taaacaattt 

121 tggtgttgat attgcttata taaaaattcc aaatgtagga 

181 ttttaaaatt cataataaaa tatgggttat tccagaaaga 

241 agaaggagat ttaaatccac caccagaagc aaaacaagtt 

301 aacatattta agtacagata atgaaaaaga taattattta 

361 tgagagaatt tattcaactg atcttggaag aatgttgtta 

421 accattttgg ggtggaagta caatagatac agaattaaaa 

481 taatgtgata caaccagatg gtagttatag atcagaagaa 

541 accctcagct gatattatac agtttgaatg taaaagcttt 

601 tacgcgaaat ggttatggct ctactcaata cattagattt 

661 ttttgaggag tcacttgaag ttgatacaaa tcctctttta 

721 agatccagca gtaacattag cacatgaact tatacatgct 

781 agcaattaat ccaaataggg tttttaaagt aaatactaat 

841 gttagaagta agctttgagg aacttagaac atttggggga 

901 tagtttacag gaaaacgaat ttcgtctata ttattataat 

961 tacacttaat aaagctaaat caatagtagg tactactgct 
1021 tgtttttaaa gagaaatatc tcctatctga agatacatct 
1081 attaaaattt gataagttat acaaaatgtt aacagagatt 
1141 taagtttttt aaagtactta acagaaaaac atatttgaat 
1201 gataaatata gtacctaagg taaattacac aatatatgat 
1261 aaatttagca gcaaacttta atggtcaaaa tacagaaatt 
1321 actaaaaaat tttactggat tgtttgaatt ttataagttg 
1381 aacttctaaa actaaatcat tagataaagg atacaataag 
1441 caaagttaat aattgggact tgttttttag tccttcagaa 
1501 aaataaagga gaagaaatta catctgatac taatatagaa 
1561 tttagattta atacaacaat attatttaac ctttaatttt 
1621 ttcaatagaa aatctttcaa gtgacattat aggccaatta 
1681 aagatttcct aatggaaaaa agtatgagtt agataaatat 
1741 tgctcaagaa tttgaacatg gtaaatctag gattgcttta 
1801 attattaaat cctagtcgtg tttatacatt tttttcttca 



tttaattaat 
aattataaag 
caaatgcaac 
gatacattta 
ccagtttcat 
aagggagtta 
acatcaatag 
gttattgata 
cttaatctag 
ggacatgaag 
agcccagatt 
ggtgcaggca 
ggacatagat 
gcctattatg 
catgatgcaa 
aagtttaaag 
tcattacagt 
ggaaaatttt 
tacacagagg 
tttgataaag 
ggatttaatt 
aataatatga 
ctatgtgtaa 
gcattaaatg 
gataatttta 
gcagcagaag 
gataatgaac 
gaacttatgc 
actatgttcc 
acaaattctg 
gactatgtaa 



tttaaatatt 
atcctgtaaa 
cagtaaaagc 
caaatcctga 
attatgattc 
caaaattatt 
taaggggaat 
ctaattgtat 
taataatagg 
ttttgaatct 
ttacatttgg 
aatttgctac 
tatatggaat 
aaatgagtgg 
agtttataga 
atatagcaag 
atatgaaaaa 
cggtagataa 
ataattttgt 
ccgtatttaa 
taagaaatac 
attttactaa 
gagggataat 
atttatgtat 
ctaatgatct 
aaaatattag 
ctgaaaatat 
ctaatataga 
attatcttcg 
ttaacgaagc 
agaaagttaa 
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1861 taaagctacg gaggcagcta tgtttttagg ctgggtagaa caattagtat atgattttac 
1921 cgatgaaact agcgaagtaa gtactacgga taaaattgcg gatataacta taattattcc 
1981 atatatagga cctgctttaa atataggtaa tatgttatat aaagatgatt ttgtaggtgc 
2041 tttaatattt tcaggagctg ttattctgtt agaatttata ccagagattg caatacctgt 
2101 attaggtact tttgcacttg tatcatatat tgcgaataag gttctaaccg ttcaaacaat 
2161 agataatgct ttaagtaaaa gaaatgaaaa atgggatgag gtctataaat atatagtaac 
2221 aaattggtta gcaaaggtta atacacagat tgatctaata agaaaaaaaa tgaaagaagc 
2281 tttagaaaat caagcagaag caacaaaggc tataataaac tatcagtata atcaatatac 
2341 tgaggaagag aaaaataata ttaattttaa tattgatgat ttaagttcga aacttaatga 
2401 gtctataaat aaagctatga ttaatataaa taaatttttg aatcaatgct ctgtttcata 
2461 tttaatgaat tctatgatcc cttatggtgt taaacggtta gaagattttg atgctagtct 
2521 taaagatgca ttattaaagt atatatatga taatagagga actttaattg gtcaagtaga 
2581 tagattaaaa gataaagtta ataatacact tagtacagat ataccttttc agctttccaa 
2641 atacgtagat aatcaaagat tattatctac atttactgaa tatattaaga atattattaa 
2701 tacttctata ttgaatttaa gatatgaaag taatcattta atagacttat ctaggtatgc 
2761 atcaaaaata aatattggta gtaaagtaaa ttttgatcca atagataaaa atcaaattca 
2821 attatttaat ttagaaagta gtaaaattga ggtaatttta aaaaatgcta ttgtatataa 
2881 tagtatgtat gaaaatttta gtactagctt ttggataaga attcctaagt attttaacag 
2941 tataagtcta aataatgaat atacaataat aaattgtatg gaaaataatt caggatggaa 
3001 agtatcactt aattatggtg aaataatctg gactttacag gatactcagg aaataaaaca 
3061 aagagtagtt tttaaataca gtcaaatgat taatatatca gattatataa acagatggat 
3121 ttttgtaact atcactaata atagattaaa taactctaaa atttatataa atggaagatt 
3181 aatagatcaa aaaccaattt caaatttagg taatattcat gctagtaata atataatgtt 
3241 taaattagat ggttgtagag atacacatag atatatttgg ataaaatatt ttaatctttt 
3301 tgataaggaa ttaaatgaaa aagaaatcaa agatttatat gataatcaat caaattcagg 
3361 tattttaaaa gacttttggg gtgattattt acaatatgat aaaccatact atatgttaaa 
3421 tttatatgat ccaaataaat atgtcgatgt aaataatgta ggtattagag gttatatgta 
3481 tcttaaaggg cctagaggta gcgtaatgac tacaaacatt tatttaaatt caagtttgta 
3541 tagggggaca aaatttatta taaaaaaata tgcttctgga aataaagata atattgttag 
3601 aaataatgat cgtgtatata ttaatgtagt agttaaaaat aaagaatata ggttagctac 
3661 taatgcatca caggcaggcg tagaaaaaat actaagtgca ttagaaatac ctgatgtagg 
3721 aaatctaagt caagtagtag taatgaagtc aaaaaatgat caaggaataa caaataaatg 
3781 caaaatgaat ttacaagata ataatgggaa tgatataggc tttataggat ttcatcagtt 
3841 taataatata gctaaactag tagcaagtaa ttggtataat agacaaatag aaagatctag 
3901 taggactttg ggttgctcat gggaatttat tcctgtagat gatggatggg gagaaaggcc 
3961 actgtaatta atctcaaact acatgagtct gtcaagaatt ttctgtaaac atccataaaa 
4021 attttaaaat taatatgttt aagaataact agatatgagt attgctatgc taatatctag 
4081 ttattttaat ttattcaata ttattacagt aagaaaaaat actattttta ttgtaaatac 
4141 aagtttagtg gtatatctca taaatgatac aagatatcat tataatgatt ttgcaaatta 
4201 tagttttgaa taaatatatt tacagtattt ttgaaatgat aataattact tcaaattctt 
4261 tagtataatt ttttaatgtc ttaattttta ca 
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The complete amino acid sequence of the Clostridium botulinum type A neurotoxin, 
deduced by nucleotide sequence analysis of the encoding gene 

naplme E THOMPSON John K. BREHM '. John D. OULTRAM '. Tracy-Janc SWINFIELD '. Clifford C. SHONE 2 , 
Tonv ATKINSON'. Jack MELLING 1 and Nigel P. MINTON 

. Division of Biotechnology, and * Division of Biologies. Public Health Laboratory Service. Centre for Applied Microbiology and Research. 

Porion Down, England 

(Received October IB. 1989) - EJB 89 1261 

A 26-mer oligonucleotide probe was synthesized (based on the determined amino acid sequence of the N- 
ternin£ ^hc°&«m boLnum type A neurotoxin, BoNT/A) and used in Soutl *rn Wol 
a restriction map of the region of the clostridial genome encompassing BoNT/A. The detailed >nrormaUon^ od uinea 
enTbled he doning of the structural gene as three distinct fragments, none of which were capable of directing the 
exj? si^ The central portion was cloned as a 2-kb Pvull - - Tagl fragment and the naming 

Xns of the light chain and heavy chain as a 2.4-kb Seal - Tag! fragment and a 3.4-kb Hpal - Pvull fragment 
ef pec iveW The nuckotide sequence of all three fragments was determined and an open reading frame : ider U fied 
SSS of 1296 codons corresponding to a polypeptide of 149502 Da. The deduced ammo acid sequence 
Pxh^bked 33% similarity to tetanus toxin, with the most highly conserved regions occurring between the N-termim 
of the ^ respect Conservation of Cys residues flanking the position at which the toxins are icleaved 

to tfeldK light chain allowed the tentative identification of those residues which probably form 

the disulphide bridges linking the two toxin subfragments. 



Toxigenic strains of Clostridium botulinum, an anaerobic 
spore-forming bacteria, produce one or more of seven 
immunologically distinct neurotoxins. They act primarily at 
the neuromuscular junction, causing paralysis by inhibiting 
the release of the neurotransmitter acetylcholine [1], The 
botulinum neurotoxins are synthesised as a single polypeptide 
chain but in their most active form comprise of two asymmet- 
ric polypeptide units; a light chain (50-59 kDa) and a heavy 
chain (85-105 kDa), linked by at least one disulphide 
bridge [2]. _ 
Although botulinum neurotoxins exhibit high degrees of 
similarity (when compared to each other and related tetanus 
and diphtheria toxins) in their molecular masses and modes of 
action, they still remain poorly characterised at the molecular 
level. Proposals for the mechanism of toxic action implicates 
three separate functional domains. The light chain has the 
pharmacological activity, while the N-terminal and C-terminal 
of the heavy chain mediate channel formation and toxin bind- 
ing, respectively. These heavy-chain, toxin-binding functional 
domains have been demonstrated with diphtheria toxin [3 - 
5] and with tetanus toxin [6-8]. Channel forming activities 
have been demonstrated for the N-terminus of the heavy chain 
with C botulinum type A [9, 10] and there is circumstantial 

Correspondence to N. P. Minion, Molecular Genetics Group, 
Division of Biotechnology, Public Health Laboratory Service, Centre 
for Applied Microbiology and Research, Porton Down, Salisbury, 
Wiltshire, SP4 0JG, England 

Note. The novel nucleotide sequence data published here has been 
deposited with the EMBL/GenBank® sequence data banks and is 
available under accession number X52066. 

The novel amino acid sequence data published here has been 
. deposited with the PIR sequence data bank, 
^••Abbreviations. BoNT/A, Clostridium botulinum type A neuro- 
toxin/. 



evidence for the association of binding properties with the 
C-terminus of the heavy chain flO -12]. , 

The entire nucleotide sequence of diphtheria tuxin [13, 14J 
and of tetanus toxin [15, 16] have been determined. Aside 
from N-terminal and C-terminal sequencing of purified toxin 
sub-fragments [11, 17, 18], the complete primary amino acid 
sequence of a botulinum neurotoxin remains undetermined. 
In the present study we present the entire amino acid sequence 
of the C botulinum type A neurotoxin (BoNT/A), deduced by 
nucleotide sequence analysis of the encoding gene. The gene 
was cloned in Escherichia colt as three separate sub-fragments, 
none of which were capable of directing the expression of a 
toxinogenic polypeptide. 



MATERIALS AND METHODS 
Materials 

Restriction endonucleases, DN A modifying enzymes and 
the pUC sequencing kit were obtained from Boehnnger 
Mannheim and used under the conditions recommended by 
the supplier. [cc- 35 S]dATP, [a- 32 P]dATP and [y- 32 P]rATP and 
the Multiprime labelling kit were purchased from Amersham 
International. Zeta Probe and nitrocellulose filters were 
obtained from Bio-Rad, and from Schleicher & Schull, respec- 
tively. Agarose was obtained from Seakem. AH oligo- 
nucleotides were synthesised on an Applied Biosystems 380A 
DNA synthesiser by solid-phase synthesis using the phosphite 
triester method [19], 

Strains, vectors and growth conditions 

The E. coli host used for all recombinant manipulations 
was TGI [A(lac-pro)thisupEhsdD5F'-traD36proAB+ AlacZ 



MI Sl t and the source of the C botulinum type A gene was 
strain NCTC 2916. The cloning vectors employed were 
piasmid pMTL23 [20] and the M13 cloning vectors M13mp8 
and M13mp9 [21]. E. coli was routinely cultured in L broth 
(1% tryptone, 0.5% yeast extract, 0.5% NaCl), supplemented 
where appropriate with ampicillin (50 ug/ml). Solidified agar 
contained 2.0% (mass/vol.) agar (Difco Laboratories). C. 
botulinum was grown in USB II broth as previously described 



Nucleic acid manipulations 

Standard techniques, including cloning procedures, pias- 
mid preparations, and bacterial transformations have been 
described elsewhere [22]. Chromosomal DNA was isolated 
from C. botulinum type A NCTC 2916 by the method of 
Marmur [23]. Cloning of C. botulinum type A toxin DNA 
fragments was performed under GMP II conditions. The gen- 
eration of M13 templates by the sonication procedure was as 
previously described [24]. Nucleotide sequencing was under- 
taken by the dideoxy-chain-termination procedure [25], se- 
quence data being analysed by the computer software of 
DNASTAR Inc. 



Screening procedures 

Restriction endonuclease digests of C. botulinum type A 
total DNA were electrophoresed in 0.8% (mass/vol.) agarose 
in 90 mM Tris/HCI, 90 mM boric acid, 3 mM EDTA, pH 8.3, 
prior to depurination and transfer of the DNA to the Zeta 
Probe membrane according to literature procedures [26]. The 
26-rner probe was labelled at the 5'-end by transfer of 
[y- PjrATP with T4 polynucleotide kinase [27]. DNA probes, 
derived from cloned fragments of the BoNT/A gene, were 
radiolabeled with the Multiprime labelling system according 
to the manufacturer's instructions. Hybridisation in 0.3 M 
sodium chloride, 30 mM sodium citrate (NaCI/Cit) using the 
oligonucleotide probe was undertaken by gradual cooling 
from 80 °C to 25 °C over a 4 h period. Filters were washed at 
55 °C in 0.1% SDS and NaCI/Cit. Hybridisation using BoNT/ 
A sub-fragments was performed overnight at 42 °C, essentially 
as described [28], except that dextran sulphate was omitted. 
After hybridisation, filters were dried at 37°C, mounted on 
Whatman 3MM paper, covered with Saran Wrap and exposed 
to X-ray film (Kodak X-Omat) with an intensifying screen for 
1 -7 days at -70°C. Recombinant clones were screened by 
in situ colony hybridisation [29] for the presence of BoNT/A- 
specific inserts with a radiolabelied oligonucleotide probe or 
cloned fragments from the BoNT/A gene. Total cell lysates of 
positive clones were tested in mice [30] to exclude the possi- 
bility of toxic polypeptide expression. 



RESULTS AND DISCUSSION 

Genomic mapping of the neurotoxin gene 

The first 35 amino acids of the N-terminus of the purified 
49 ; kDa fragment (H 2 ) of Bo NT/ A has previously been deter- 
mined [11]. As amino acids 6-14 represent the peptide 
sequence exhibiting minimal translational degeneracy, an 
encoding 26-mer oligonucleotide was synthesised (Fig. \) to 
allow the detection of the gene by DNA - DNA hybridisation 
techniques. The choice of nucleotide at positions of degener- 
acy was made both on the basis of the nucleotide present in 
the equivalent codon of the closely related tetanus gene (15 



16] and on the general codon bias exhibited by clostridial genes 
[13]. Additionally, in observing the convention that d(G • *]S 
pairing is neutral in terms of base pairing, the strand syn- 
thesised (Fig. 1) possessed the maximum number of wobble- 
position dT residues. Southern hybridisation experiments 
were undertaken using the radiolabelied 26 mer oligo- 
nucleotide, as a probe, and genomic DNA isolated from C 
botulinum type A. The data obtained enabled the assignment 
of restriction sites to the region of the clostridial genome which 
encodes the neurotoxin gene (Fig. 2). 

Cloning of the central portion of the BoNT/ A gene 

Data from Southern hybridisation experiments demon- 
strated that DNA encoding amino acids 6- 14 of the heavy 
chain, resided on a 5.0-kb Pvull restriction fragment, which 
was further divided upon cleavage with Taq\ into a 3.0-kb and 
a 2.0-kb PvuW - Taql fragment, the latter of which encoded 
the N-terminus of the H 2 subunit. Accordingly, genomic DNA 
from C. botulinum type A digested with PvuW was size-frac- 
tionated and fragments of around 5.0 kb isolated and subject- 
ed to further digestion with TaqL After size-fractionation, 
restiictcd DNA of approximately 2.0 kb was purified and 
shown to contain a restriction fragment capable of hybridising 
the oligonucleotide probe. The DNA fragments were ligated 
to Smal/Accl'digzsted pMTL23 [20] and 1500 recombinant 
transformants were screened for the presence of type A 
neurotoxin-specific sequences using in situ colony hybridisa- 
tion and a radiolabelied oligonucleotide probe. A total of five 
positive clones were obtained, from which one was chosen; its 
recombinant piasmid designated pCBA2. Restriction enzyme 
analysis of the insert of pCBA2 demonstrated the presence of 
a DNA insert of the expected size. 

By following the above cloning strategy, the chances of 
generating an E. coli recombinant clone capable of producing 
a toxigenic molecule were negligible. This conclusion was 
based on a number of factors. Principal amongst these are the 
observations [11] that both the heavy chain and light chain 
are required for toxicity (i.e. purified preparations^ either 
subunit are non-toxic), removal of the C-terminus of the toxin 
results in an inactive molecule (composed to the entire light 
chain and 54% of the heavy chain) and the large size of the 
structural gene (predicted to be greater than 4.0 kb). Thus on 
size considerations, the 2.0-kb PvuW - Taql fragment could 
not encode a toxic molecule. Furthermore, the method used 
in purifying DNA fragments to be cloned eliminated the risk 
of concomitant cloning the contiguous regions of the gene, 
i.e. the larger portion of the 5.0-kb PvuW fragment could not 
have been isolated following cleavage with Taql and sub- 
sequent size fractionation. Nevertheless, toxicity tests were 
routinely performed on the lysates from all primary clones 
to exclude the possibility that a toxic molecule was being 
produced. 

The cloning vector pMTL23 was specifically designed to 
facilitate the generation of M13 templates by the sonication 
procedure [20]. Cleavage of pCBA2 with the restriction en- 
zymes BamHl and BglU allowed the excision of the clostridial 
DNA insert as a 2.05-kb DNA fragment with compatible 
cohesive termini. The gel-purified DNA was circularised by 
self-ligation, fragmented by sonication and the random DNA 
strands generated as blunt-ended fragments by treatment with 
T4 polymerase and cloned into the Smal site of M13mp8. The 
nucleotide sequence data obtained from 200 such recombinant 
templates was compiled into a single contiguous sequence 
using the computer software of DNASTAR INC. The insert 
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Fig. 1. Selection of an oligonucleotide gene probe for the BoNTjA gene. The nucleotides incorporated at positions of degeneracy in the 
oligonucleotide probe (bold) were chosen on the basis of clostridial codon usage [31] and those present in the equivalent position in the tetanus 
gene [16]. The actual sequence of the BoNT/A gene is shown below the oligonucleotide probe, identity between the two being indicated by I 
mismatch by a lower case letter m the actual sequence, and the G :T pairing by a colon. Actual sequence refers to the BoNT/A gene sequence 
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Lrnw, ^ pCBA2) , w u crc *q«cnccd by the sonicalion procedure 24]. The region of [he pCBA4 insert encompassed by the sinelc-hc ukd 
arrows was sequenced by primer extension in the appropriate direction * fc 



of pCBA2 proved to be 1845 bp in length (Fig. 1), and in 
common with previously cloned clostridial DNA, had a low 
G/C content (27.53%). 

Translational of the DNA in the six possible reading 
frames demonstrated the presence of a single open reading 
frame (622 codons) encompassing the entire cloned region, 
with a single discrepancy (see below), the presence of an 
amino acid sequence corresponding to that derived from the 
purified H 2 fragment (1 1] confirmed that this open reading 
frame encoded BoNT/A. Examination of the nucleotide se- 
quence in this region indicated that of the eight potential 
mismatches between the oligonucleotide probe and the anti- 
sense DNA strand, only two had occurred, one of which 
involved a neutral d(G • T) pairing (Fig. 1). From comparison 
oi the amino acid sequence with tetanus toxin, it was estimated 

lie/ P 9 BA2 encoded 50% of th e BoNT/A light chain and 
Jo /o of the heavy chain. 



Having cloned the central portion of the BoNT/A gene, 
specific sub-fragments of the pCBA2 insert were used as 
radiolabeled probes to identify, and subsequently clone, 
genomic fragments encoding the remainder of the geiie. 

Cloning of the 3 '-end of the BoNTjA gene 

A 650-bp Hpal - Tacfl restriclion fragment was isolated 
from pCBA2 and used as the radiolabelled probe in a Southern 
blot analysis of C. botulinum type A genomic DNA. Restric- 
tion enzymes utilised in the cleavageV chromosomal DNA 
were those known to cleave both the internal (identified from 
the nucleotide sequence) and external (using the genomic map 
data) cloned insert of pCBA2. The data obtained indicated 
that the 3'-end of the gene resided on a 3.4-kb Hpal - PruU 
fragment. Accordingly, genomic DNA was cleaved with Hpal 
and Taql, size-fractionated, and DNA fragments of approxi- 
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xhoi? dftf9fceslevdtn Piig*g>c£ a t 
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P Ciar,Bliha 9hr ly g iai npnr 

TTTTTAAAGTAAATACTAATGCCTATTATGAAATGAGTGG^ 

CATGATGCAAAGTTTATAGATAGTTTACAGGAAAACGAATTTCGTCTATAT 960 
naa<Ci d3aqenef r i y y ynkfkdiii3 

TACACTTAATAAAGCTAAATCAATAGTAGGTACTACTGCTTCATTACAGTATATC 1(H0 
cxnk a k 3 iv 9 tta3lq ymknvfkeky 

TCCTATCTGAAGATACATCTGGAAAATTTTCGG^ U2Q 

TACACAGAGGATAATTWTTAAGTTTTTTAAAG^ l20Q 

GATAAATATAGTACCTAAGGTAAATTACACAATATATGAT^ 1280 
J-vpuvnytiydgfnlrntnlaanf 
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AAGATTTCCTAATGGAAAAAAGTATGAGTTAGATAAATATACTATGTTCCATTATCTTCGTGCTCAAGAATTTGAACATG 1760 

GTAAATCTAGGATTGCTTTAACA^TTCIG^AAC^GCATTA^ „,„ 
•avneaiinpsrvytffas 

T T J TG ^ G T G 3 T ^ T T^ T ^^^ TATGT "" A ^ TCTCT «^«"* CT ««G»"iTAc l920 

Seal t *" ,n ' fl »' ,v «<llvydft 
CGATGAAACTAGCGAAGTAAGTACTA^TAAAATTGCGGATAT^ 

«ttaKiaditilipyigp al 

ATATAGGTAATAIGTTATATAJAGAT^TOAGGIWTTT^ 

CCAGAGATTGCAATACCTGTATTAGGTACTWTGCACWGTATCATATA 2160 

AGATAAIGCTTTAAGIA^GAAATGAAAMI^ATG*^ 

ATACA^GATCTAATAAGAAAAAAA^G^C^A^ 
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GICTATAAATAAA^A^ITAATATA^IAAATm^GM ^ 



CTTATGGTGTTAAACGGTTAGAAGATTTTGXTCCTAGTCTT^AACATGCATTATTAAAGTATATATATGATAATAGAGGA 2560 
pygvkrledfdaelkdallkyiydnrg 

ACTTTAATTGGTCAAGTAG ATAGATTAAAAGATAAAGTTAATAATACACTTAGTACAGATAT ACCTTTTCAGCTTTCCAA 2 6 40 
tligqvdtlkdkvnntlotdipfqisk 

ATACGTAGATAATCAAAGATTATTATCTACATTTACTGAATATATTAAGAATATTATTAATACTTCTATATTGAATTTAA 2720 
yvdnqrllstfteyikniintailnl 

GATATGAAAGTAATCATTTAATAGACTTATCTAGGTATGCATCAAAAATAAATATTGGTAGTAAAGTAAATTTTGATCCA 2800 
r y e a n h 1 idlsryaakinigakvn fdp 

ATAGATAAAAATCAAATTCAATTATTTAATTTAGAAAGTAGTAAAATTGAGGTAATTTTAAAAAATGCTATTGTATATAA 2880 
idknqiql£nleaaki»vilkn'aivyn 
Seal 

TAGTATGTATG AAAATTTT ACTACTAGCTTTTGGATAAGAATTCCTAAGTATTTTAACAGTATAAGTCTAAATAATGAAT 2960 
amyenfat a fwitipkyfnsiolnne 

AT ACA AT AAT AAATTG T ATGG AAAATAATTCAGG ATGG AAAGT ATCACTTAATT ATGG TGAAAT AATCTGG ACTTT ACAG 3040 
ytiincmcnnagwkvslnygoi iwtlq 

GATACTCAGGAAATAAAACAAAGAGTAGTTTTTAAA'fACAGTCAAATGATTAATATATCAGATTATATAAACAGATGGAT 3120 
dtqeikq.rvvfkyaqrainisdyinrwi 

TTTTGTAACTATCACTAATAATAGATTAAATAACTCTAAAATTTATATAAATGGAAGATTAATAGATCAAAAACCAATTT 3200 
fvtitnnrlnnakiyingrlidqkpi 

CAAAT TT AGGT AAT AT TCATGCTAGTAAT AAT AT AATGTT TAAATT AG ATGGTTGT AG AG AT ACAC AT AGAT ATATTTGG 3280 
anlgnihasn.nimfkldgcrdthryiw 

ATAAAATATTTTAATCTTTTTGATAAGGAATTAAATGAAAAAGAAATCAAAGATTTATATGATAATCAATCAAATTCAGG 33 60 
i k y f n 1 fdkeinekeikdlydnqanag 

TATTTTAAAAGACTTTTGGGGTGATTATTTACAATATGATAAACCATACTATATGTTAAATTTATATGATCCAAATAAAT 34 40 
il kdfwgdylqydkpyymlnlydpnk 
Taql 

ATGTCGATGT AAATAATGTAGGTATTAGAGGTTATATGTATCTTAAAGGGCCTAGAGGTAGCGTAATGACTACAAACATT 3520 
yvdvnnvgi rgymylkgprgavmttni 

TATTTAAATTCAAGTTTGTATAGGGGGACAAAATTTATTATAAAAAAATATGCTTCTGGAAATAAAGATAATATTGTTAG 3600 
ylnasly rgtkfiikkyaagnkdnivr 

AAAT AATGATCGTGT ATAT ATTAATGT AGT AGTT AAAAAT AAAGAATAT AGGTTAGCTACT AATGCATCACAGGCAGGCG 3680 
nndrvyinvvvknkeyrlatnaaqag 

TAGAAAAAAT ACT AAGTGCATT AGAAATACCTGATGTAGG AAATCTAAGTC AAGTAGTAGTAATGAAGTCAAAAAATGAT 37 60 
vokilaa lelpdvgnlaqvvvmkaknd 

C AAGGAAT AAC AAATAAATGCAAAATGAATTT ACAAGATAATAATGGGAATGATATAGGCTTT AT AGGATTTC ATCAG TT 3840 
qgltnkc ktnnlqdnngndigfigfhqf 

XhoII 

TAATAATATAGCTAAACTAGTAGCAAGTAATTGGTATAATAGACAAATAGAAAGATCTAGTAGGACTTTGGGTTGCTCAT 3920 
nnlaklvaonwynrqierasctlgca 

GGGAATTT ATTCCTGT AGATGATGGATGGGG AGAAAGGCCACrGTAATTAATCTCAAACTACATGAGTCTGTCAAGAATT 4000 
wef ipvddgvgerpl. 

TTCTGTAAACATCCATAAAAATTTTAAAATTAATATGTTTAAGAATAACTAGATATGAGTOTGCTMGCTAATATCTAG 4080 



TTATTTTAATTTATTCAATATTATTACAGTAAGAAAAAATACTATTTTTATTGTAAATACAAGTTTAGTGGTATATCTCA 4160 



TAAATGATACAAGATATC ATT ATAATGATTTTGCAAATTAT AG1TTTGAATAAAT AT ATTT ACAGT ATTTTTGAAATGAT 42 40 



AATAATTACTTCAAATTCTTTAGTATAATTTTTTAATGTCTTAATTT7TACA 4292 



mately 3.4 kb in size isolated and cloned into pMTL23 be- 
tween the Smal and Cla\ sites. It should be noted that the 
cloning strategy ensured that concomitant cloning of contigu- 
ous regions of the BoNT/A gene was extremely remote. 
Although cleavage of genomic DN A with both Hpal and Prul 
releases the 3'-end of the gene as the desired 3.4-kb restriction 
fragment, the adjacent 5'-end of the gene is fragmented into 
a 0.5-kb Hpal f a 0.9-kb Hpal-Pvull and a 0.25-kb Pvull - 
Hpal sub-fragment. Such fragments would not be purified 
during the isolation of the 3.4-kb DNA restriction fragments. 

A total of 1500 recombinant clones were screened by in 
situ colony hybridisation, using the insert of pCBA2 as a 
radioiabelled probe, and five positive clones identified. One 
such clone was chosen for further analysis and its plasmid 
isolated (designated pCBA3), and shown to contain a DNA 
insert of the expected size. The nucleotide sequence of the 
insert of pCBA3 was determined exactly as described for 
pCBA2. The sequence obtained demonstrated the expected 
• Overlap with that of pCBA2, allowing the identification of the 



transiational stop codon of the open reading frame presumed 
to encode BoNT/A (see below). 

Cloning of the 5' -end of the Bo NT/ A gene 

The cloning of the 5'-end of the BoNT/A gene was under- 
taken in a manner analogous to that described for the 3'-end. 
In this case the DNA fragment used as a radiolabeled probe 
in a Southern blot analysis of the C. botutinum type A genome 
was a 0.9-kb Pvull -Hpal fragment derived from pCBA2. 
The restriction fragment targeted for cloning was a 2.4-kb 
Seal — Taql fragment. Concomitant cloning of adjacent 
BoNT/A sequences was again avoided since cleavage of 
chromosomal DNA with Seal and Taql results in the fragmen- 
tation of the central portion of the gene into a 0.55-kb Taql - 
Seal fragment and two Seal restriction fragments of 0.55 kb 
and 0.75 kb. Appropriately cleaved genomic DNA was there- 
fore size fractionated and purified DNA fragments of approxi- 
mately 2.4-kb inserted into pMTL23 cut with Smal and Clal. 



A single recombinant plasmid was identified carrying the de~ 
« sired insert and designated pCBA4. 

In contrast to the inserts of the previous two clones, the 
nucleotide sequence of the pCBA4 insert was not derived by 
the sonication procedure, but was cloned into M13mp9 as a 
2,4-kb BamHl — Bglll fragment. The nucleotide sequence of 
the antisense strand was determined by primer extension using 
custom synthesised oligonucleotides. As it proved impossible 
to clone the same fragment into the vector M13mp8, the 
nucleotide sequence of the sense strand was derived using 
pCBA4 DNA and the appropriate primers. In both cases the 
region sequenced corresponded to the structural gene and the 
immediate 5'-non-coding region. 

Features of the coding region 

The nucleotide sequences derived from pCBA2, pCBA3, 
and pCBA4 were compiled into a single contiguous sequence 
using the computer software of DNASTAR INC. The se- 
quence depicted in Fig. 3 represents a 4292-bp portion of this 
sequence and has been determined in its entirety on both 
DNA strands. Translation of the sequence identified an open 
reading frame of 3891 bp commencing with an AUG start 
cod on and terminating withaUAAstop cod on. The deduced 
polypeptide is composed of 1296 amino acid residues with a 
predicted molecular mass of 149502 Da. This is in close 
agreement with the determined molecular mass of purified 
BoNT/A [11]. Comparison of the predicted amino acid se- 
quence of the open reading frame with published amino acid 
sequences determined from purified toxin fragments con- 
firmed the gene identified as encoding BoNT/A. Thus the 17 
N-terminal amino acid residues encoded by the open reading 
frame identified agree with the previously determined amino 
acid sequence of the BoNT/A light chain, except at amino 
acid position 2 where we have Gin rather than the previously 
reported Pro [17]. Similarly, with one exception, amino acids 
449-465 and 449-483 of our sequence agree with the pri- 
mary structure determinations undertaken on the purified 
heavy chain by Sathymoorthy and co-workers [18] and Shone 
and co-workers [11], respectively. The one discrepancy arises 
at position 480 where we predict Glu in contrast to the pre- 
viously reported Pro [11]. The sequence of the predicted poly- 
peptide of amino acids 873-896 exhibit a high degree of 
similarity to the determined amino acid sequence of the BoNT/ 
A C-terminus [18]. The two exceptions are amino acid pos- 
itions 876 and 892, where the predicted residues are Thr876 
and Ser896, which replace the previously determined Leu and 
Lys residues, respectively. It is possible that either the small 
amounts of protein available for analysis led to sequence- 
detcrminalion errors or that strain differences may exist. 

Thccodon usage exhibited by the BoNT/A gene conforms 
to the pattern generally seen For genes isolated from Clos- 
tridium species whose DNA displays a high (70%) d(A/T) 
content [31 -35]. The principal features of this pattern are the 
use of AUG and UAA as the respective translational initiation 
and termination codons, and a strong discrimination against 
all degenerate codons ending in C or G or. in the case of Scr 
and Arg, beginning with C. Thus in the BoNT/A gene. 86.1% 
of Arg codons conform to AGN rather than CGN. 69% of 
Leu codons conform to UUA as opposed to CUN. while 
overall, 90.3% of the degenerate codons end in A or U. The 
one exception to this rule appears to be in the choice of Lys 
codons, where the frequency of occurrence of the codons AAA 
and AAG is almost equal, being 24 and 20 respectively. This 
lack of bias is in contradiction to other clostridial genes, e.g. 



the distribution in the related tetanus gene is 98 AAA codoio| 
versus 9 AAG codons. A consequence of the observed codon 
bias is that many of those codons known to act as modulators^ 
of gene expression in E. coli occur frequently in clostridial] 
genes [31 J. The BoNT/A gene therefore exhibits a 53.8% pref« : 
erence for the AUA (He) codon, 43.7% preference for the 
GGA (Gly) codon and an overall 86.1% preference for the 
AGN (Arg) codon. Other modulator codons [CGA (Arg), 
CGC (Arg) and CUA (Leu)] are used infrequently. 

With exception of the N-terminal Met, the deduced amino 
acid sequence of BoNT/A corresponds exactly to that deter- 
mined experimentally from purified toxin. Thus, as is the case 
with tetanus toxin [16], secretion of BoNT/A toxin is not 
mediated by possession of a signal peptide [36]. Post-trans- 
lational modification of the nascent polypeptide N-terminus 
is therefore limited to removal of a methionine residue. 



Features of noncoding region 

The noncoding region on the 5'-position adjacent to the 
BoNT/A structural gene was examined for regulatory se- 
quences. Putative ribosome binding sites [37] were identified 
five and eight nucleotides upstream from the AUG start 
codon. d(AGGTGT) and d(AAGAGG), respectively (Fig. 2). 
The latter sequence would appear to be the more favourable 
candidate for the BoNT/A ribosome-binding site, since (a) the 
average spatial distance between the translational start codon 
and the ribosome-binding site sequences of clostridial genes 
is eight nucleotides [31]; (b) the BoNT/A ribosome-binding 
site exhibits greater overall similarity with procaryotic 16S 
rRNA [38]; (c) the alternative candidate, d(AAGTGT), is 
preceded by dG, a factor associated with misalignment [39]. 

A putative promoter region, similar to £. coli and Bacil- 
lus suhtilis promoters [40], was identified [d(-35-TGGTCA); 
d(-IO-TTTAAT)] 5' to the BoNT/A structural genc. The 
spacial distance between these two sequences (16 nucleotides) 
is consistent with that found in E. coli [40], B. suhtilis[4\] and 
Clostridium spp. [31] viz., ±17, ±16-19 and + 16 or ± 17, 
respectively. Further analysis of the region for the extended 
promoter sequence found in Gram-positive organisms [42], 
showed that the highly conserved dT residue at"— 1 6. the dA 
residue at —6 and the dT residues at —5 and —3 occurred. 
Only 7 of the 12 conserved residues in the -4 to - 18 region 
were observed [42], however this may be a consequence of the 
high d(A/T) content of the DNA (82%) in the noncoding 
region. 

The AUG translational initiation codon of the Bo NT/A 
gene is preceded by a region of dyad symmetry, which if 
transcribed would form a hairpin loop structure with AG = 
—48.5 k.1 (43]. In such a structure the putative -35 region 
would he concealed within the stem (as would the extreme 5'- 
end of the ribosome-binding site), while the -10 sequence 
would be situated within the loop. Stem loops 5' to the ribo- 
some-binding site and start codon have been described in 
E. coli. and implicated in mRN A stability by protecting the 
translation initiation complex [44]. A second region of dyad 
symmetry was observed 75 nucleotides 3' to the UAA termin- 
ation codon, and resembles a ^-independent transcriptional 
terminator since it is followed by a stretch of dT (five out 
of eight). A mRN A transcript of this region would have a 
calculated AG » -71.1 kJ [43]. Two further regions of dyad 
symmetry were also present on the 3' side of this sequence, 
although the stem loop structures that could be formed were 
thermodynamically less stable, having AG values of - 19.2 kJ 
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TBT SRSFLVNQMIMEAKKQLLBFDTQSKNILMQYIKANSKFI0ITBLKKLBSKIHKVFSTPIPF3YSKNLDC— WVDNBBDIDVILKKSTILNLDINNDII3DI 

B50* " " 

900v v v v v 950v v v v 

BOT 3RYASKINrG3KVNFDP-XDKN0XQLFNLBS3KIEVILKNAIVYNSMYEHFST3FWXRIPKYPNS — I3LNNBYTIINCMBNN3 GWKV3LNYGEI 

3:3: P I: I L N ESS : V I YN M: NP: SPW:R:PK S NBY:II MS GW V3L : 

TBT SGPNSSVITYPDAQLVPCINGKAIHLVNNESSEVIVHKAMDIBYNDMFNNFTVSFWLRVPKVSASHLBQY0TNBYSIIB3MKKHSLSIG3GWSVSLKGNNL 

950* .... 1000* 

▼ lOOOv v v v v 1050v .v v v 

BOT IWTLqDTQEIKQRVVFKYSQHlNISDYI-NRWIFVTITHMRLNNSKIYINGRLIDQKPISfJLGNIHASNNIHFKLDGCRDTHRYIWIKYFNLFDKBLHEKE 
IWTL D: : F: Y: N:W:F:TITW:R :YINO L I: 10 I NMI KLD C : Y: I F :F K LH KB 

TET IWTLKDSAGBVRQITFR-DLPDKFNAYLAHKWVFITITNDRLSSANLYINGVLM03AEITGLCAIRBDNNITLRLDRCNNNNQYVSIDKFRIFCKALNPKB 

1050* v 

v 1 lOOv v v v v 1 1 50v v v v 

BOT XKDLYDNQ3K3GILKDFWGDYLQYDKPYYMLNLYDPHKYVDVNNVGIRGYMYLKGPRGSVMTTNIYLNSSLYRGTKPIIKKYASGNK-DNIVRNND— RVY 
I LY L:DFWO; L YD YY : K V : N: YMYL LY O KFIIKrY N D V: D ::Y 

TET IBKLYTSYL3ITFLRDFWCNPLRYDTEYYLIPVA3S3KDVQLKNI— TDYMYLTNAP3YTNGKLNIYYRRLYNGLKFIIKRYTPNNBIDSPVKSGDFIKLY 

1150- - - - * 

v 1200v v v v v I250v v v 

BOT INVWKNKEYRLATWA3Q-AGVEKILS-ALBIPDVGNLSQVVVMKSKNDQGITNKCKHNLQDMNGNDI0FIGFH— QF— NNIAKLVASNWYNRQIBRSS 

: : ::IL P : K : : : L D: :G :G H 0 N L : ASNWY 

TBT VSYNMNBHIVGYPKDGNAFNNLDRILRVGYNAPGIPLYKKMEAVKLRDLKTYSV0--LKLYDDKNA3LCLVGTHNGQI0NDPNRDILIASNWYFNHL— KD 

1250" * 

v 1290v 
BOT RTLGCSWBFIPVDDGWGERPL 

: LGC W F:P D:OW 
TET KILGCDWYFVPTDEGWTND 
1300* 

Fig. 4. Comparative alignment of the primary amino acid sequences of the tioNT/A f HOT) ami tetanus (TUT) neurotoxins. Identical amino 
acids are as shown, conservative replacements arc indicated by colons. The N-lcrminat amino acids of Hie heavy chain (H-CHAIN) of both 
toxins urc marked by right-angled arrows. The Cys residues thought lo be involved in the formation of the disulphidc bridge, linking the heavy 
chain and light chain, arc the first Cys residues on cither side of these arrows 



(nucleotides 4104-4136) and AG = -34.3 kJ (nucleotides 
4158-4188). 

Similarity to tetanus toxin 

The structural and pharmacological related ness of bolu- 
linum and tetanus toxin has long been recognised. Although 
ihe recent determination of the amino acid sequence of tetanus 



toxin allowed limited comparisons with botuliiuim toxins to 
be made [16). the availability of the entire amino acid sequence 
of BoNT A now allows a more complete comparative analy- 
sis. Comparative alignment of tetanus toxin and BoNT/A 
neurotoxin (Fig. 3) reveals that the two toxins exhibit 32.9% 
similarity. It is apparent that overall the heavy chains exhibit 
a higher degree of similarity than the light chains, at both the 
level of identity (heavy chain, 34.5%; light chain 29.5%) and 



80 



Botulinum toxin 



1250 




1250- 



Fig. 5. Dot matrix plot of BoNTjA and tetanus toxin similarity. Simi- 
larity parameters show a 60% match with a window size or 10 amino 
acids 



similarity (heavy chain, 47.9%; light chain 41.8%). This is 
expected since the heavy chain is the receptor-binding domain 
and the light chain the effector, and such membrane/receptor- 
binding proteins have similar structural requirements. It is 
apparent thai specific regions of both the heavy chains and 
light chains exhibit substantial similarity. The higher degree 
of conservation of primary sequence between the heavy chains 
of the two toxins is more apparent when the two proteins are 
aligned using a dot matrix plot. The illustration in Fie. 4 
represents such an alignment, using a window size of 1 0 amino 
acids and scoring 60% similarity. 

Conservation of Cys residues in tetanus and botulinum 
toxins was observed at positions 1060 and 1 280. Of particular 
note is the conservation of Cys454 which occurs at the same 
position in the N-termini of C botulinum type A, B and E and 
on the tetanus toxins [16]. Cys454 is the only Cys residue in 
the N-terminal region of the heavy chain and is probably 
involved in the disulphide bridge to the light chain. Similarly, 
Cys430 on the light chain of the botulinum toxin is conserved 
in the same position as in the tetanus toxin, and from the 
positions and predicted structural comparisons of the other 
two Cys residues in this domain, is the most probable Cys 
residue to form a disulphide bridge linking the light and heavy 
chains. 

The similarity shared by the botulinum type A and tetanus 
toxms, and pattern of alignment displayed in the dot-matrix 
plot (Fig. 3), suggest that they are derived from a common 
ancestral gene. Similar conclusions have been drawn from 
comparisons of short N-terminal amino acid sequences of 
the light chains and heavy chains of purified tetanus and 
botulinum type A, B and E toxin fragments [16]. 

Structural features 

Comparative alignment of tetanus and BoNT/A with a 
dot-matrix plot identified six regions where 80% similarity 
occurred (regions in question are marked on Fig. 3). Of these 



regions, four occurred in the N-terminus of the heavy chain. 
Hydrophilicity plots of the predicted amino acid sequence of 
botulinum type A toxin, according to the methods of Kyte 
and Doolittle [45], and Hopp and Woods [46], showed that 
three of these highly conserved sequences occurred in essen- 
tially hydrophobic regions. The first of these predicted regions 
contained 19residues ofuncharged amino acids from Ile630- 
Tyr648. The second region, which is separated from the first 
region by three charged amino acid residues, extends from 
Phe652-Asn687 and contains two acidic residues. The third 
region involves Leu773- Val807. The ratio of hydrophobic 
amino acids/polar amino acids contained in the first two re- 
gions was greater (74:20) than the third reaion (54:40) 
Hydrophilicity plots [45, 46] of tetanus toxin showed that the 
equivalent three regions also occurred in areas of hydro- 
phobicity. One of these regions in tetanus toxin (Asn660- 
Ala691) has been identified as sufficient in length and hydro- 
phobic^ to potentially span mammalian cell membranes [36]. 
The first two hydrophobic regions identified in botulinum 
type A toxin also have this potential. The conserved regions 
occurring in the N-terminal domain or the heavy chain are 
hydrophobic, which is implicated in channel-forming activities 
m both toxins in vitro [6, 7, 9, 11]. Although diphtheria toxin 
is structurally and functionally similar to the botulinum and 
tetanus neurotoxins, no significant overall similarity to the C. 
botulinum type A toxin was found. 

Of the three known proteolytic cleavage sites in the 
botulinum toxin protein sequence, the first (448-449), which 
produces toxin from the inactive precursor, occurs within a 
predicted short helix between two areas of ^-pleated sheets 
The papain cleavage site (856-857) occurs at a 0-lurn be- 
tween two /?-pleated sheets. The tryptic cleavage site (873- 
874), which is specific to botulinum toxin, again occurs at a 
//-turn between two //-pleated sheets. 

Although amino acid sequence fingerprintine [47] for the 
occurrence of the classical A DP ribose binding fold [48] reveals 
no sign of this motif, the light chain of BoNT/A has a predicted 
structure rich in /?-pteated sheets, ct-helices and 0-pleated sheet 
folds. In addition the distribution of Lys and Arg residues in 
the light chain, which predominate the C-terminal one third 
of the light chain, together with other partial amino acid 
sequence similarity, suggest a structure capable of binding a 
nucleic acid or nucleotide cofactor. The closest light chain 
primary structure similarity is found with a plethora of 
unrelated proteins btit amongst these are the group of acetyl- 
CoA acyltransferases. More detailed analysis of the strikingly 
similar amino acid sequences, e.g. 223-230, also reveals an 
amino acid sequence corresponding to RNA-, DNA- and 
nucleotide-binding proteins. The Lys and Arg distribution is 
such that while the whole molecule has a predicted pi of 6.3, 
the light chain is basic (in contrast to tetanus toxin) and the 
heavy chain acidic. The predicted secondary structure of the 
heavy chain is reminiscent of many receptor- or immuno- 
globulm-binding proteins in that it contains long stretches of 
oc-hehx, /?-p!eated sheet and 0-turn structures, and import- 
antly, such structures abound in the C-terminal domain from 
residue 850 onwards. Comparison of the heavy chain se- 
quences with the data base indicates that the closest primary 
structure similarity is found with a number of receptors and 
receptor-binding proteins including the acetyl cholinesterase 
precursor protein. Secondary structure comparisons of the 
botulinum and tetanus toxin sequences show greater differ- 
ences between the light chain structures than with the heavy 
chain structures, in agreement with the above observations 
and known functions of these molecules. 
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Conclusion 

C. botulimtm is frequently used as a test organism by 
microbiologists in the food industry, while its neurotoxins are 
used by a growing number of neurobiochcmists in the study 
of nerve action. More importantly, this neurotoxin is finding 
ever increasing clinical uses as an alternative to surgical ma- 
nipulation of "a wide range of aberrant muscular functions 
[49]. A consequence of these applications is the required immu- 
nisation of the personnel involved. The availability of the 
BoNT/A gene sequence will therefore not only facilitate 
structure/function studies but also allow the production of 
toxin for clinical use and toxoid for the formulation of im- 
proved vaccines. 

We arc grateful 10 Ken Fan lorn and Roy Marl well for synthesis 
of the oligonucleotides used in this study and lo Nicola Minion for 
the preparation of this manuscript. 
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CLOBOTB 4041 bp DNA linear BCT 26-APR-1993 

Clostridium botulinum neurotoxin type B (botB) gene, complete cds . 
M81186 

M81186.1 GI:144734 
botB gene; neurotoxin type B. 
Clostridium botulinum 
Clostridium botulinum 

Bacteria; Firmicutes; Clostridia; Clostridiales ; Clostridiaceae; 
Clostridium. 
1 (bases 1 to 4041) 

Whelan,S.M., Elmore, M. J., Bodsworth, N. J. , Brehm,J.K., Atkinson, T . 
and Minton,N. P. 

Complete nucleotide sequence of the Clostridium botulinum gene 
encoding the type B neurotoxin 
Unpublished (1991) 

Original source text: Clostridium botulinum DNA. 
Location/Qualifiers 
1. .4041 

/organism= M Clostridium botulinum" 

/mol_type="genomic DNA" 

/db_xref="taxon: 14 91" 

57. .3932 

/gene="botB" 

57. .3932 

/gene="botB" 

/function=" vertebrate neurotoxin" 
/codon_start=l 
/transl_table=ll 
/product="neurotoxin type B" 
/protein_id=" AAA23211 . 1 " 
/db_xref="GI : 144735" 

/trans lation="MPVTINNFNYNDPIDNNNIIMMEPPFARGTGRYYKAFKITDRIW 
I IPERYTFGYKPEDFNKSSGIFNRDVCEYYDPDYLNTNDKKNIFLQTMIKLFNRIKSK 
PLGEKLLEMI INGI PYLGDRRVPLEEFNTNIASVTVNKLISNPGEVERKKGIFANLI I 
FGPGPVLNENETIDIGIQNHFASREGFGGIMQMKFCPEYVSVFNNVQENKGASIFNRR 
GYFSDPALILMHELIHVLHGLYGIKVDDLPIVPNEKKFFMQSTDAIQAEELYTFGGQD 
PSI ITPSTDKSIYDKVLQNFRGIVDRLNKVLVCISDPNININIYKNKFKDKYKFVEDS 
EGKYSIDVESFDKLYKSLMFGFTETNIAENYKIKTRASYFSDSLPPVKIKNLLDNEIY 
TIEEGFNISDKDMEKEYRGQNKAINKQAYEEISKEHLAVYKIQMCKSVKAPGICIDVD 
NEDLFFIADKNSFSDDLSKNERIEYNTQSNYIENDFPINELILDTDLISKIELPSENT 
ESLTDFNVDVPVYEKQPAIKKIFTDENTIFQYLYSQTFPLDIRDISLTSSFDDALLFS 
NKVYSFFSMDYIKTANKVVEAGLFAGWVKQIVNDFVIEANKSNTMDKIADISLIVPYI 
GLALNVGNETAKGNFENAFEIAGASILLEFIPELLIPVVGAFLLESYIDNKNKIIKTI 
DNALTKRNEKWSDMYGLIVAQWLSTVNTQFYTIKEGMYKALNYQAQALEEIIKYRYNI 
YSEKEKSNINIDFNDINSKLNEGINQAIDNINNFINGCSVSYLMKKMIPLAVEKLLDF 
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DNTLKKNLLNYIDENKLYLIGSAEYEKSKVNKYLKTIMPFDLSIYTNDTILIEMFNKY 
NSEILNNIILNLRYKDNNLIDLSGYGAKVEVYDGVELNDKNQFKLTSSANSKIRVTQN 
QNI IFNSVFLDFSVSFWIRI PKYKNDGIQNYIHNEYTI INCMKNNSGWKISIRGNRI I 
WTLIDINGKTKSVFFEYNIREDISEYINRWFFVTITNNLNNAKIYINGKLESNTDIKD 
IREVIANGEI IFKLDGDIDRTQFIWMKYFSIFNTELSQSNIEERYKIQSYSEYLKDFW 
GNPLMYNKEYYMFNAGNKNSYIKLKKDSPVGEILTRSKYNQNSKYINYRDLYIGEKFI 
IRRKSNSQSINDDIVRKEDYIYLDFFNLNQEWRVYTYKYFKKEEEKLFLAPISDSDEF 
YNTIQIKEYDEQPTYSCQLLFKKDEESTDEIGLIGIHRFYESGIVFEEYKDYFCISKW 
YLKEVKRKPYNLKLGCNWQFIPKDEGWTE" 

ORIGIN 

1 tgegcattta tgggcattaa aagggatata aacttaaaat aaggaggaga atatttatgc 
61 cagttacaat aaataatttt aattataatg atcctattga taataataat attattatga 
121 tggagcctcc atttgegaga ggtacgggga gatattataa agcttttaaa atcacagatc 
181 gtatttggat aataccggaa agatatactt ttggatataa acctgaggat tttaataaaa 
241 gttceggtat ttttaataga gatgtttgtg aatattatga tccagattac ttaaatacta 
301 atgataaaaa gaatatattt ttacaaacaa tgatcaagtt atttaataga atcaaatcaa 
361 aaccattggg tgaaaagtta ttagagatga ttataaatgg tataccttat cttggagata 
421 gacgtgttcc actcgaagag tttaacacaa acattgetag tgtaactgtt aataaattaa 
481 tcagtaatcc aggagaagtg gagegaaaaa aaggtatttt cgcaaattta ataatatttg 
541 gacctgggcc agttttaaat gaaaatgaga ctatagatat aggtatacaa aatcattttg 
601 catcaaggga aggctteggg ggtataatgc aaatgaagtt ttgcccagaa tatgtaagcg 
661 tatttaataa tgttcaagaa aacaaaggcg caagtatatt taatagacgt ggatattttt 
721 cagatccagc cttgatatta atgeatgaac ttatacatgt tttacatgga ttatatggca 
781 ttaaagtaga tgatttacca attgtaccaa atgaaaaaaa attttttatg caatctacag 
841 atgetataca ggcagaagaa ctatatacat ttggaggaca agatcccagc atcataactc 
901 ettctaegga taaaagtatc tatgataaag ttttgcaaaa ttttagaggg atagttgata 
961 gacttaacaa ggttttagtt tgcatatcag atcctaacat taatattaat atatataaaa 
1021 ataaatttaa agataaatat aaattcgttg aagattctga gggaaaatat agtatagatg 
1081 tagaaagttt tgataaatta tataaaagct taatgtttgg ttttacagaa actaatatag 
1141 cagaaaatta taaaataaaa actagagctt cttattttag tgattcctta ccaccagtaa 
1201 aaataaaaaa tttattagat aatgaaatct atactataga ggaagggttt aatatatctg 
1261 ataaagatat ggaaaaagaa tatagaggtc agaataaagc tataaataaa caagcttatg 
1321 aagaaattag caaggagcat ttggctgtat ataagataca aatgtgtaaa agtgttaaag 
1381 ctccaggaat atgtattgat gttgataatg aagatttgtt ctttatagct gataaaaata 
1441 gtttttcaga tgatttatct aaaaacgaaa gaatagaata taatacacag agtaattata 
1501 tagaaaatga cttccctata aatgaattaa ttttagatac tgatttaata agtaaaatag 
1561 aattaccaag tgaaaataca gaatcactta ctgattttaa tgtagatgtt ccagtatatg 
1621 aaaaacaacc cgctataaaa aaaattttta cagatgaaaa taccatcttt caatatttat 
1681 actctcagac atttcctcta gatataagag atataagttt aacatcttca tttgatgatg 
1741 cattattatt ttctaacaaa gtttattcat ttttttctat ggattatatt aaaactgeta 
1801 ataaagtggt agaagcagga ttatttgcag gttgggtgaa acagatagta aatgattttg 
1861 taatcgaagc taataaaagc aatactatgg ataaaattgc agatatatct ctaattgttc 
1921 cttatatagg attagcttta aatgtaggaa atgaaacagc taaaggaaat tttgaaaatg 
1981 cttttgagat tgeaggagee agtattctac tagaatttat accagaactt ttaatacctg 
2041 tagttggagc ctttttatta gaatcatata ttgacaataa aaataaaatt attaaaacaa 
2101 tagataatgc tttaactaaa agaaatgaaa aatggagtga tatgtacgga ttaatagtag 
2161 cgcaatggct ctcaacagtt aatactcaat tttatacaat aaaagaggga atgtataagg 
2221 ctttaaatta tcaagcacaa gcattggaag aaataataaa atacagatat aatatatatt 
2281 ctgaaaaaga aaagtcaaat attaacatcg attttaatga tataaattct aaacttaatg 
2341 agggtattaa ccaagctata gataatataa ataattttat aaatggatgt tctgtatcat 
2401 atttaatgaa aaaaatgatt ccattagctg tagaaaaatt actagacttt gataatactc 
24 61 tcaaaaaaaa tttgttaaat tatatagatg aaaataaatt atatttgatt ggaagtgcag 
2521 aatatgaaaa atcaaaagta aataaatact tgaaaaccat tatgccgttt gatctttcaa 
2581 tatataccaa tgatacaata ctaatagaaa tgtttaataa atataatagc gaaattttaa 
2641 ataatattat cttaaattta agatataagg ataataattt aatagattta tcaggatatg 
2701 gggcaaaggt agaggtatat gatggagtcg agcttaatga taaaaatcaa tttaaattaa 
2761 ctagttcagc aaatagtaag attagagtga ctcaaaatca gaatatcata tttaatagtg 
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tgttccttga 
tacaaaatta 
ggaaaatatc 
ecaaateggt 
ggttttttgt 
tagaatcaaa 
ttaaattaga 
ttaatacgga 
aatatttaaa 
atgcggggaa 
taacaegtag 
gagaaaaatt 
gaaaagaaga 
cctataaata 
atgagtttta 
agttgctttt 
gtttctacga 
ggtacttaaa 
ttattcctaa 
tatataagaa 
tattagataa 



// 



ttttagcgtt 
tattcataat 
tattaggggt 
attttttgaa 
aactattact 
tacagatatt 
tggtgatata 
attaagtcaa 
agatttttgg 
taaaaattca 
caaatataat 
tattataaga 
ttatatatat 
ttttaagaaa 
caatactata 
taaaaaagat 
atctggaatt 
agaggtaaaa 
agatgaaggg 
aagtttaagt 
actacatgtt 



agcttttgga 
gaatatacaa 
aataggataa 
tataacataa 
aataatttga 
aaagatataa 
gatagaacac 
tcaaatattg 
ggaaatcctt 
tatattaaac 
caaaattcta 
agaaagtcaa 
ctagattttt 
gaggaagaaa 
caaataaaag 
gaagaaagta 
gtatttgaag 
aggaaaccat 
tggactgaat 
ttataaaatc 
t 



taagaatacc 
taattaattg 
tatggacttt 
gagaagatat 
ataaegctaa 
gagaagttat 
aatttatttg 
aagaaagata 
taatgtacaa 
taaagaaaga 
aatatataaa 
attctcaatc 
ttaatttaaa 
aattgttttt 
aatatgatga 
ctgatgagat 
agtataaaga 
ataatttaaa 
aatataacta 
ttaagtttaa 



taaatataag 
tatgaaaaat 
aattgatata 
atcagagtat 
aatttatatt 
tgctaatggt 
gatgaaatat 
taaaattcaa 
taaagaatat 
ttcacctgta 
ttatagagat 
tataaatgat 
tcaagagtgg 
agctcctata 
acagccaaca 
aggattgatt 
ttatttttgt 
attgggatgt 
tatgetcage 
ggatgtagct 



aatgatggta 
aatteggget 
aatggaaaaa 
ataaatagat 
aatggtaagc 
gaaataatat 
ttcagtattt 
teatatageg 
tatatgttta 
ggtgaaattt 
ttatatattg 
gatatagtta 
agagtatata 
agtgattctg 
tatagttgtc 
ggtattcatc 
ataagtaaat 
aattggcagt 
aaacctattt 
aaattttgaa 
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in hma ntfrTlf S ri 7 d fr0m ' he aost P dium botulinum type A neurotoxin (BoNT/A) gene (botA) were used 
r? 0 „^' h hybridization reactions to derive a restriction map of the region of the C. botulinum type B strain 
Danish chromosome encoding botB. As the one probe encoded part of the BoNT/A heavy (H) chain and the 

e tablilhed Th T ^ P ° SUi ° n ° rientati ° n ° f ^ «° this ma P «™ 

established. The temperature at which hybridization occurred indicated that a higher degree of DNA homology 

iTkuLnT^ I f CneS i" J*-*****™*** By using the derived restriction map data,^ 

It I ,h S ™ * L r8gme , nt e ", COd,ng the entlre BoNT/B L chain and 108 amin ° acids of H chain w« cloned 
!ril h ,Z k ■ y " UC ThS*?. 8 ' A conti 8 uous 1-8-kb fragment encoding a further 623 amino 

frl ™ > n uT r S a,S ° C,0ne , d - The 3 I"" ° f ,he Be " e W3S 0btained ^ c,oni "8 a »•«■«* fragment amplified 
from genomic DNA by inverse polymerase cha.n reaction. Translation of the nucleotide sequence derived from 
all three clones demonstrated that BoNT/B was composed of 1,291 amino acids. Comparative alignment of Us 
sequence with all currently characterised BoNTs (A, C, D, and E) and tetanus toxin (TeTx) showea that a wid 

2lfZlu^n^m m0 ^ r Urred de P enden « on which component of the dichain was compared. Thus, 
the L chain of BoNT/B exhibits the greatest degree of homology (50% identity) with the TeTx L chain, whereas 
ts H chain is most homologous (48% identity) with the BoNT/A H chain. Overall, the six neurotoxins were 
shown to be composed of highly conserved amino acid domains interceded with amino acid tracts exhibiting 
ind IK of" 84 S m a S;^H ,0ta '' ^ amin °/ li cids »f a " average of 442 are absolute.y conserved between L cha nf 
11a ILint u I ? cons . ervedbetw «» H chains. Conservation ofTrp residues (one in the L chain 
and nine ,n the H cha.n) was particularly striking. The most divergent region corresponds to the extreme 
carboxy terminus of each toxin, which may reflect differences in specificity of binding to neurone acceptor sftes! 

iJKXT'l neUr0t< ?* in ( B °NT) and tetanus toxin (TeTx) are acetylcholine release at the nerve periphery whereas TeTx 

TeT. x involves three distinct phases. In the first phase the TeTx (11), BoNT/A (4, 44) BoNTTC (1 ^ BoNT/E nd 

dVoSlni fnt^Ifil . i , ' S K 0, L°T ed b ? a " Cnergy " thc clonin 8 of the «enc encoding BoNT/B and the 

ttdS Sr In . T"- ° r Par ' ° f dferiVa,i ° n ° f ,he en,ire amino ac ' d se < ue " c ° °f «he neuro- 

of Thereafter, an unidentified active moiety toxin by nucleotide sequencing 

°l iWTtoxin causes nerve cell dysfunction by blocking thc 
intracellular release of neurotransmitters. The two classes of 

toxins differ, however, in that BoNT preferentially inhibits MATERIALS AND METHODS 

Bacterial strains, plasmids, and culture conditions. The 

* '• 7reWdfnc author ? UrCe ° f c L h . romosoma ' DNA was C. botulinum Danish, and 

• responding author. , hc recombinant host used for donjng ixpeTimtnls ' wis 
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Escherichia coli TGI [t(lac-pro) supE thi hsdD5i¥'-traD36 
proA * B + /ac/ q lacZ&MIS]. Cloning vectors employed were 
plasmids pMTL32 (this study), pMTL23 (7), and pCRlOOO 
(26) and the M13 bacteriophages mpl8 and mpl9 (46). C 
botulinum was cultivated in USA II broth (2% peptone, 1% 
yeast extract, 1% N-Z amine, 0.05% sodium mercaptoace- 
tate, 1% glucose (pH 7.4]), and E. coli was cultivated in L 
broth (1% tryptone, 0.5% yeast extract, 0.5% NaCl). Solid- 
ified medium (L agar) consisted of L broth with the addition 
of 2% (wt/vo!) Bacto Agar (Difco Laboratories). Antibiotic 
concentrations used for the maintenance and the selection of 
transformants were 50 u.g of ampicillin (pMTL32) and 50 u.g 
of kanamycin (pCRlOOO) per ml. Restriction endonucleases 
and DNA modifying enzymes were purchased from 
Northumbria Biochemicals Ltd., Taq polymerase was from 
United States Biochemical Corporation, and radiolabel was 
from Amersham International. 

Purification and manipulation of DNA. Transformation of 
£. coli and large-scale plasmid isolation procedures were as 
described previously (27). Small-scale plasmid isolation was 
by the method of Holmes and Quigley (19), while chromo- 
somal DNA from C. botulinum was prepared essentially as 
described by Marmur (23). Restriction endonucleases and 
DNA modifying enzymes were used under the conditions 
recommended by the suppliers. Digests were electro- 
phoresed in 1% agarose slab gels on a standard horizontal 
system (Bethesda Research Laboratories model H4), em- 
ploying Tris borate-EDTA (0.09 M Tris borate, 0.002 M 
EDTA) buffer. Fragments were isolated from gels by elec- 
trocution (25). All primary cloning procedures were under- 
taken under United Kingdom ACGM C2 containment con- 
ditions, and total cell lysates of all recombinants carrying 
cloned material were tested in mice for the absence of toxic 
polypeptides. 

DNA-DNA hybridization experiments. DNA restriction 
fragments were transferred from agarose gels to Zeta Probe 
nylon membrane by the procedure of Reed and Mann 
(34). After partial depurination with 0.25 M HCl (15 min), 
DNA was transferred in 0.4 M NaOH by capillary elution for 
4 to 16 h. Bacterial colonies were screened for desired 
recombinant plasmids by in situ colony hybridization (13), 
using nitrocellulose filter disks (0.22 u.m; Schleicher and 
Schuell). The gel-purified botA DNA fragments were la- 
belled with [a- P]dATP, using a multiprime kit supplied by 
Amersham International. Hybridizations were carried out as 
described previously (44) at temperatures ranging from 45 to 
60°C 

Nucleotide sequence of pCBB plasmid inserts. The insert of 
plasmid pCBBl was excised by cleavage with BamHl and 
Bglll and circularized by treatment with T4 ligase, and 
size-fractionated 500- to 1,000-bp fragments generated by 
sonication were cloned into the Smal site of M13mpl8 
(for experimental conditions, see reference 28). Approxi- 
mately 50 templates were then sequenced by the dideoxy- 
nucleotide method of Sanger et al. (35), using a modified 
version of bacteriophage T7 DNA polymerase, Sequenase 
(43). Experimental conditions used were those stated by the 
supplier (United States Biochemical Corp.). The inserts of 
plasmids pCBB2 and pCBB3 were sequenced by using 
templates derived by subcloning the entire region between 
the appropriate sites of M13mpl8 and M13rnpl9. Sequence 
data obtained by employing universal primer were then 
sequentially extended by the use of custom-synthesized 
oligonucleotide primers. In certain instances, templates 
were generated by the insertion of Dral restriction subfrag- 
ments into the Smal site of M13mpl8. In all cases the 



sequence was determined on both DNA strands. The ch<c>- 
mosomal DNA region amplified with primers XI and X2 
(Fig. 1) was cloned directly into ddT-tailed, Smdcut 
M13mp8 (prepared by incubating Smal-cut DNA with-fcr 
minal transferase in the presence of dideoxy TTP), and -ffcc 
resultant template was sequenced with universal prifnBr. 
DNA sequence data were analyzed by using the comhuJer 
software of DNASTAR Inc. 

Amplification of DNA by PCR. Amplification of C. bohjj^ 
num DNA was undertaken by polymerase chain reaction 
(PCR), using an M J Research Inc. thermal cycler. Rearfon 
mixtures contained 10 mM Tris-HCI, 50 mM KC1, 3 vhM 
MgCl 2 , 0.1 mM deoxynucleoside triphosphate, 30 nrm.: of 
each primer, 2.5 U of Taq polymerase, and 10 ng of siiain 
Danish genomic DNA, in a final volume of 0.1 ml. Amplifi- 
cation was for 30 cycles, as follows: 1.5 min at 93 0 C,3frun 
at 37°C, and 3 min at 72°C. For inverse PCR, 140 ir^e of 
chromosomal DNA, cleaved with an appropriate restriction 
endonuclease, was ligated overnight at 14°C in a SO-uJ 
volume and a 10-u.l portion of the resultant concatenated 
DNA was used in PCR. 

Nucleotide sequence accession number. The nucleotide 
sequence has been submitted to the GenBank/EMBL dlata 
banks, with the accession number M81186. 



RESULTS AND DISCUSSION 

Southern blot analysis of the botB gene. Previous studies 
have shown that BoNT appears to conform to the clg^ical 
A-B binary toxin model (12). Thus, both L and H chains are 
required for toxicity (14, 39). The risk of generating E. 
coli clone with the capacity to produce a neuroparalytic 
polypeptide may therefore be alleviated by cloning ge Atomic 
restriction fragments which encode principally only one 
component of the dichain molecule. To identify sucn frag- 
ments, we exploited DNA homology between botB and the 
previously cloned botA (44). 

A 389-bp Hpal-Xholl botA fragment, encoding amino 
acids 216 through 346 of the BoNT/A L chain, and a (ja&bp 
HaelhHindlU fragment, coding for amino acids 526 through 
736 of the H chain (44), were radiolabeled and uS£d in 
DNA-DNA hybridizations with type B chromosomal DNA 
cleaved with various restriction enzymes. Reactions were 
performed in aqueous solution over a range of tempera- 
tures. "Weak" hybridization between the two genes was 
found to occur at 53 and 56°C with the L- and H chain 
probes, respectively (data not shown). The strength of 
the signal observed and the relatively low stringency re- 
quired were indicative of a fairly low level of DNX ho- 
mology between botA and botB. Furthermore, these results 
suggest that the L-chain-encoding regions of the twtf genes 
are less homologous than the H-chain-encoding region, at 
least in the areas probed. The conditions under Which 
hybridization occurred having been established, the "t^pe B 
genomic DNA was cleaved with various combinations of 
restriction endonucleases and the nylon membranes carry- 
ing the resultant fragments were sequentially hyMdized 
with the two probes. The data obtained allowed the flfriva- 
tion of a restriction map of the region of the type B$£Wome 
encoding botB. Furthermore, the use of the two Probes 
enabled the assignment of both the position of kaW and 
its relative orientation with respect to the derived rn$p(Fig- 

Cloning and sequencing of the botB L chain. The res-friction 
map derived by the Southern blot experiments (p£- 1) 
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indicated that a 2.1-kb Bgll\-Xba\ fragment principally en- 
coded the L chain of BoNT/B. To clone this DNA, and to 
minimize the risk of cloning contiguous BoNT/B-e needing 
regions, the targeted fragment was purified by a two-stage 
gel isolation procedure. C. botulinum type B chromosomal 
DNA was cleaved with Xbal, and fragments of approxi- 
mately 7.5 kb were purified from agarose gels by electroelu- 
tion. The isolated DNA was then subjected to digestion with 
Bglll, DNA fragments of around 2.1 kb were gel purified and 
ligated to pMTL32 vector DNA (Fig. 2) cut with Xbal and 
BamRl, and the resultant TGI transformants were screened 
tor the presence, of recombinant clones, using the botA 
L-chain probe. Vector pMTL32 was specifically constructed 
lor the purposes of cloning the botB DNA (Fig. 2). Based on 
the pMTL1003 backbone (6), it carries multiple cloning sites 
nanked on either side by tandem copies of transcriptional 
terminators. Heterologous genes inserted into the multiple 
cloning sites will therefore only be expressed if they carry 
indigenous transcriptional elements recognized by the RNA 
polymerases of £L colL 

The recombinant piasmid obtained, designated pCBBl, 
Was sh own by digestion with appropriate endonucleases to 
contain restriction enzyme recognition sites consistent with 
"?e map illustrated in Fig. 1. Its entire insert was excised by 
Jgestion with BamHl and Bglll, and M13 recombinant 
templates containing random inserts were derived by using a 
sonication procedure (28). By using these templates and 
custom synthesized oligonucleotides, the entire nucleotide 
sequence of the insert was determined on both strands, 
fanslatton of the resultant sequence indicated the presence 

an open reading frame encoding a polypeptide of 549 



amino acids in size. The amino terminus of this polypeptide 
exhibited perfect conformity to that experimentally deter- 
mined for purified BoNT/B L chain (36). Amino acids 442 
through 459 were identical to those determined for purified 
BoNT/B H chain (36). Thus, the insert carried by pCBBl 
was deemed to encode the entire L chain of BoNT/B and 108 
amino acids from the H chain. 

Cloning and sequencing of the botB H chain. After it was 
determined that the 2.1-kb Bglll-Xbal fragment encoded 
the entire BoNT/B L chain and the amino terminus of the 
H chain, it was apparent that the adjacent 1.8-kb Xbal 
fragment (Fig. 1) should encode the majority of the re- 
maining H chain. Type B chromosomal DNA was cleaved 
with Hindlll, fragments of approximately 3.5 kb were 
isolated and digested with Xbal t and fragments of around 
1.8 kb were gel purified. The isolated DNA was ligated 
with Abal-cleaved pMTL32 and transformed into E. coli 
TGI, and recombinant plasmids were identified by probing 
with the radiolabeled botA H-chain probe. One such pias- 
mid was designated pCBB2, and the nucleotide sequence 
of its insert was determined, following its insertion into 
M13mpl8, by employing custom-synthesized oligonucleo- 
tide primers. 

Translation of the nucleotide sequence obtained revealed 
the presence of a continuous open reading frame of 623 
codons, in the same reading frame relative to the Xbal site of 
that of the insert of piasmid pCBBl. To confirm that the two 
sequences were indeed contiguous, a 289-bp region of DNA 
encompassing the Xbal site was amplified from type B 
genomic DNA by using primers XI and X2 (Fig. 1) in a PCR 
and cloned directly into ddT-tailed Smal-cut M13mp8. Nu- 
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FIG. 2. Cloning vector pMTL32. This plasmid ^was derived ^as 
follows. A synthetic DNA fragment (S'-AGCCCGCCTAATGAGCG 
GGCTTTTnT-3'). corresponding to the E. coli trpA transcnptional 
terminator, was ligated to SrwI-cleaved pMTL23 (7), and a recom- 
binant plasmid (pTRP23) was selected in which two tandem copies 
of trpA had been inserted. The resultant double terminator, together 
with part of the pMTL23 polylinker region, was excised as a 107-bp 
Nrul-EcoRl fragment and inserted between the EcoRl and EcoRV 
sites of plasmid pMTL1003 (6). As the ca. 350-bp EcoRI-EcoRV 
fragment of pMTL1003 is deleted during this manipulation, the 
resultant plasmid, pMTL32, does not carry a copy of the trp 
promoter. 



cleotide sequencing of a derivative template, using universal 
primer, demonstrated that the inserts of plasmids pCBBl 
and pCBB2 were contiguous in the C. botulinum type B 
chromosome. . , 

Completion of the botB sequence. By combining the two 
sequences of pCBBl and pCBB2, the derived contiguous 
open reading frame encoded 1,170 amino acids, indicating 
that some 120 or so codbns of the botB gene ^were missing. A 
DNA region encompassing the remaining 3' end of the gene 
was cloned by inverse PCR. Type B chromosomal DNA was 
cleaved with Hintilll and incubated with T4 ligase, and the 
resultant concatenated DNA was used as a template in PCR 
with oligonucleotides X3 and X4 (Fig. 1). The 1.6-kb frag- 
ment generated was cloned directly into the specialized 
vector pCRlOOO, and the recombinant plasmid obtained was 
designated pCBB3. A plasmid sequence reaction, under- 
taken with a primer previously employed in the determina- 
tion of the nucleotide sequence of the insert of plasmid 
pCBB2, confirmed the presence of the botB gene. Thereaf- 
ter, the nucleotide sequence of the region of pCBB3 encom- 
passing the 3' end of botB was determined by subcloning 
selected overlapping fragments into M13. To rule out the 
possibility that the insert of pCBB3 may have contained 
PCR-induced errors, a second version of this plasmid recom- 
binant was derived by cloning the amplified DNA product 
from a further independent inverse PCR. Nucleotide se- 
quencing of the appropriate regions of this second plasmid 
gave a sequence identical to that already derived from the 
primary isolate of pCBB3. 

The entire nucleotide sequence of the botB gene (Fig. 3) 
was obtained by splicing the individual sequence information 
derived from the inserts of pCBBl, pCBB2, and pCBB3 into 
a contiguous sequence. The gene is composed of 1,291 
codons, initiating with an AUG codon at position 55 and 
terminating with a UAA stop codpn at position 3928 (Fig. 3). 
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The choice of these particular translational codons is typical 
of clostridial genes (47). As with all other bot genes charac- 
terized to date, the high A+T content of the DNA (74.6%) 
results in an extreme bias towards the use of codons ending 
in A or T and the frequent use of codons recognized as 
modulators in E. coli. The translational start codon is 
preceded by a sequence typical of clostridial nbosome 
binding sites (47). 

Alignment of the nucleotide sequences of the two botA- 
derived DNA probes used in Southern blot mapping with the 
equivalent regions of botB confirmed that the greater degree 
of homology existed in the respective H-chain-encoding 
regions over those encoding L chain. Specifically, the 628-bp 
/faelll-f/indlll botA fragment demonstrated 65% homology 
with botB, whereas the 389-bp Hpal-Xholl botA fragment 
had 54.8% homology with botB. Comparative alignment 
demonstrated that, in general, the overall DNA homology 
(Table 1) between the H- and L-chain-encoding regions of 
all sequenced neurotoxin genes reflected the level of amino 
acid sequence homology (Table 2) and averaged between 
50 and 60% identity. One consequence of this relative 
dissimilarity between genes is that DNA probes specific to 
each toxin gene may be easily designed. However, although 
there is sufficient homology in certain regions to derive a 
generalized probe for the generic detection of neurotoxin 
genes, it has not proven possible to design a probe which 
hybridizes to all bot genes and not to the TeTx gene 
(unpublished data). 

Predicted amino add sequence of BoNT/B. The deduced 
primary sequence of BoNT/B demonstrates that the toxin is 
composed of 1,291 amino acid residues. By comparison to 
partial amino acid sequences derived from purified polypep- 
tides from other C. botulinum type B strains, it is apparent 
that variations in toxin structure occur. Thus, although 
amino acid residues 2 through 17 exhibit perfect conformity 
to the sequence derived by Edman degradation of punfied 
BoNT/B L chain of strain B/Okra (36), the amino acid at 
position 23 of the H chain was determined (10) to be Arg 
rather than the Ser residue seen here (position 464, Fig. 4). 
Similarly, the BoNT/B of strain B/657 possesses a Met 
amino acid at position 30 of the L chain (9) compared with 
Thr in the case of BoNT/B from both strain Danish and 
strain B/Okra. Variations in the primary amino acid se- 
quence of other types of BoNT have been noted, e.g., 
between BoNT/A of strains 62A (4) and NCTC 2916 (44) and 
between BoNT/E of strains Beluga, Mashike, Iwanai, Otaru, 
and NCTC 11219 (see reference 45). In the case of BoNT/B, 
such variations help to explain observed dissimilarity in the 
immunological properties of BoNT/B isolated from different 
strains (16, 31). t . ; . 

Pairwise comparisons of the respective L- and H-cnain 
components of all six toxins were undertaken, and the 
results are summarized in Table 2. From this it can be seen 
that, with notable exceptions, the overall level of identity 
between L chains varies from around 30 to 35%. The three 
exceptions are the degrees of homology seen between 
BoNT/E and TeTx (40%), BoNT/C and BoNT/D (47%), and 
BoNT/B and TeTx (50%). The last homology is particularly 
striking and serves to illustrate the close relationships be- 
tween the pharmacological action of BoNT and TeTx. In 
contrast to the situation with the L-chain subunit, the H 
chains of BoNT/B and TeTx represent one of the most 
divergent pairings. The greatest level of homology (48% 
identity) to BoNT/B in this region is with BoNT/A. A similar 
relationship exists between the dichain components of 
BoNT/E and TeTx and BoNT/A. These observations sug- 
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gest that either L- and H-chain domains of an individual 
neurotoxin have evolved at disproportionate rates or at 
various stages toxins have arisen by fusion of distinct H and 
L chains. 

A full alignment of all presently characterized clostridial 
neurotoxins is illustrated in Fig. 4. This analysis demon- 
strates that they are composed of highly conserved domains 
interceded with tracts of amino acids exhibiting little overall 
relatedness, although considerable identity between the 
components of a specific pair is apparent in certain of these 
regions. The close relationship that exists between these 
neurotoxins is further exemplified by their arrangement of 
polar and nonpolar amino acids. Thus, analysis by the 
method of Kyte and Doolittle (22) demonstrates that all six 
neurotoxins possess a highly characteristic hydrophilicity 



profile (Fig. 5). From these profiles it is apparent that a 
substantial region of hydrophobicity is centrally conserved 
in all toxins (Fig. 5). This region has previously been 
suggested, in the case of TeTx and BoNT/A (11, 44), to play 
some role in the channel-forming properties of toxin H 
chain. 

* Within the L-chain region (average size, 442 amino acids), 
68 amino acids are totally conserved, including a Cys amino 
acid in the carboxy termini. The Cys residues at this position 
represent one of only two positions where this particular 
amino acid is absolutely conserved, the second occurring in 
the amino termini of the H chains. These two Cys residues 
are therefore undoubtedly involved in the formation of the 
disulfide bridge linking the two dichain components of all six 
neurotoxins. Eleven of the conserved amino acids of the 
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CLUST^ neurotoxins. The illustrated alignment was essentially derived by using computer program 

include a^a ^nw^ COnsc ™ d a ™<* »« neurotoxins hale be'en boxed^and 

of six toxins are in bolS^S^ tIOn ? Wen %- acids absolutely conserved in five 

sequence of BoNT/B of s£i^7 t0 , th2 l! ° f B ? NT/B ' Differences from the partial amino acid 

amino acids presumed to be invo ved n the format^ % k^*^ thC Slrai " B/Danish B ° NT/B The Cys 

facing open arrows formation of the ^sulfide bridge between the neurotoxin L and H chains are marked by downward 



neurotoxin L chains reside in a region (positions 223 to 241 
or boNT/B) which encompasses a histidine-rich motif. The 
ftree conserved His residues of this region, on the basis of 
tne.r conservation in BoNT/A, BoNT/E, and TeTx, have 
Previously been suggested to play some role in the presumed 
catalytic activity of the L chain (4). Their conservation in all 
s»x neurotoxins does not detract from this hypothesis. Pre- 
liminary work, however, in which site-directed mutagenesis 



has been used to effect amino acid substitutions at all three 
His positions did not affect the toxicity of a BoNT/A sub- 
unit in an Apfysia californica buccal ganglian model system 

A total of 110 amino acids are absolutely conserved within 
the H-chain region (average size, 845 amino acids). Most 
notable is the high degree of conservation of Trp amino 
acids. Of the 13 Trp residues which occur in the BoNT/B H 
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TABLE 1. Nucleotide sequence homology between characterized 
bot structural genes 0 



% Identity among H-chain- and among L- chain-encoding regions* 
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49.3 


65.2 
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* A, B, C, D, and E refer to the respective gene; TET represents the TeTx 
gene. 

b Identities between H-chain -encoding and between L-chain-encoding re- 
gions arc given above and below the diagonal, respectively. 



TABLE 2. Degree of homology between the respective L- and 
H-chain components of characterized clostridial neurotoxins" 



Neurotoxin 




% Identity among H chains and among L chains 6 
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* A, B, C, D, and E refer to the respective BoNT; TET represents TeTx. 
6 Identities between H chains and between L chains are given above and 
below the diagonal, respectively. 



chain, 9 are absolutely conserved in all toxins. In the 
majority of the four other positions, where a difference does 
exist in a particular toxin in six of nine cases, the substitution 
of a chemically similar amino acid has occurred. The only 
Trp that occurs in the BoNT/B L chain is conserved in all 
neurotoxins. The functional significance of the apparent 
evolutionary pressure for maintaining this relatively rare 
amino acid, or chemically similar residues, at these positions 
in BoNT and TeTx remains unknown. However, previous 
studies in which BoNT Trp residues have been selectively 
modified by chemical means has established a potential role 
in both toxicity and immunogenicity (8). Indeed, in one 
study it was reported that the modification of a single Trp 
resulted in nearly complete detoxification (Shibaeva et al., 
1981, cited in reference 8). The selective disruption of 
conserved Trp amino acids in BoNT by site-directed muta- 



genesis should clarify which residue(s), if any, is important 
in toxicity and antigenicity. 

The most striking area of sequence divergence between 
toxins occurs in carboxy- terminal areas of their H chains 
from, in the case of BoNT/B, around residue 1100 onwards. 
Given that this part of the toxin plays a major role in cell 
binding and that different toxins bind to distinct cell acceptor 
molecules, the finding that none of the toxins are alike in this 
region is perhaps not surprising. In view of the preceding 
region of divergence, the conservation of a sequence motif 
conforming to the consensus W-X-F-I/V-P/S-X-D/E-X-G-W- 
X-E/N (BoNT/B positions 1280 through 1291) at the extreme 
carboxy terminus is particularly intriguing. 
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FIG. 5. Hydrophobicity plots of all currently characterized clostridial neurotoxins. Hydrophobicity was calculated by using the computer 
program of Kyte and Doolittle (22) with a window size of nine amino acids. The average value for each toxin was as follows: BoNT/A, -0.37; 
BoNT/B, -0.42; BoNT/C, -0.41; BoNT/D, -0.36; BoNT/E, -0.45; TeTx., -0.37. The conserved hydrophobic region is indicated below 
each profile by a barred line. The respective residues involved are 652 through 687 (BoNT/A), 642 through 671 (BoNT/B), 648 through 678 
(BoNT/Q, 646 through 674 (BoNT/D), 624 through 654 (BoNT/E), and 660 through 691 (TeTx). 
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